Adding Stored Procedures to SQLite

After reading some traffic on the sqlite-users list, where some people were inquiring about stored procedures for SQLite, I decided I would try to implement stored procedures on SQLite. The primary motivation for me was just to see how far I could get – I am not convinced that it’s necessarily a good idea.

The first step was to examine the existing SQLite grammar in src/parse.y and add the additional grammar required for stored procedures, then build out the code from there.

The idea is to be able to support multiple stored procedure language implementations, where each implementation, except the special, ‘sqlite’, are implemented as SQLite extensions – i.e. loadable shared libraries, which embed interpreted languages – e.g. Python, Lua, Perl, etc. Language-specific dispatch is keyed off the ‘LANGUAGE’ specifier in the CREATE PROCEDURE DDL statement and is stored as metadata in the stored proc schema table. (‘sp_schema’)

SQLite Grammar Additions for Stored Procedures:

The grammar to create a stored procedure is similar to the grammar used by PostgreSQL for user-defined function creation:

CREATE [ OR REPLACE ] PROC[EDURE] [ IF NOT EXISTS ]
  [dbname.] procedure_name ( [ [ argname ] argtype [, ...] ] )
  [ RETURNS rettype ] AS $$ 
  'proc body text' 
  $$
  LANGUAGE langname

The key feature here, is that the whole body of the stored procedure is placed between two double dollar-signs, used as a delimiter of a block of opaque text – this approach makes it easy to treat the whole of the stored procedure body text as a single token to the lexical scanner (a/k/a tokenizer, lexer).

Other grammars can do without the \$$….$$ escape trick, by implementing “start states” in the lexical scanner. Since this is a POC and another major db vendor (PostegreSQL) uses the escape characters trick, that’s how I did it.

The grammar to execute a stored procedure is similar to Sybase (or MS SQLServer):

exec[ute] [@return_status  = ]
 [dbname.]procedure_name
      [[@parameter_name =] value | 
           [@parameter_name =] @variable
      [, [@parameter_name =] value | 
           [@parameter_name =] @variable] 

Note that this implementation mandates that stored procedures can only be invoked in a statement, not an expression. (just like Sybase/SQL Server – but unlike PostgreSQL, whose user-defined functions can be invoked in an expression) This limitation makes the implementation easier.

Also note that even though the grammar can consume variable definitions, for example, return status, and expressions with variables – in SQLite itself, there is no concept of session variables, so these constructs will bomb if you attempt to use them.

The grammar to drop a stored procedure is also similar to Sybase:

drop proc[edure] [dbname.] procedure_name

In this release, there are only two language implementations: ‘sqlite’ and ‘python’. ‘sqlite’ simply indicates that the stored procedure body is a semicolon-delimited batch of sql statements.

The ‘sqlite’ language is implemented via the internal routine, ‘sqlite3NestedParse’. Since the SQLite grammar does not implement control-flow statements, the utility of language type ‘sqlite’ is limited.

As expected, there are limitations – for example, with language type ‘sqlite’, if multiple select statements are batched together, they must all return the same number of columns.

Here is a basic example:

$ ./sqlite3 test.db
SQLite version 3.7.3.p1
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create proc foo() as $$select 'foo'$$ language sqlite;
sqlite> exec foo();
foo
sqlite> drop proc foo;
sqlite> 

The Python language implementation is accomplished by using a modified version of Gerhard Häring’s PySqlite, which is an implementation of the Python Database API 2.0. I believe this approach is better then using the APSW API, since the Python DBAPI 2.0 spec is database-neutral, thus porting database application logic to python stored procedures will be more streamlined. (in case this crazy idea ever gets fully realized)

I also had problems hacking the APSW module to accept a sqlite3* pointer, which is another reason I used the pysqlite2 module instead.

Here is an example of a Python stored procedure (using old-style exception handling):

create or replace proc pytest3() returns resultset as $$
  from pysqlite2 import dbapi2 as sqlite3
  import sys
  con = sqlite3.connect()
  con.row_factory = sqlite3.Row
  cur = con.cursor()
  try:
      try:
          cur.execute('spresult select * from sqlite_master')
      except sqlite3.OperationalError, (errmsg):
          print "sqlite3.OperationalError: %s" % (errmsg)
      except sqlite3.ProgrammingError, (errmsg):
          print "sqlite3.ProgrammingError: %s" % (errmsg)
      except:
          print "Unexpected error: %s" % sys.exc_info()[0]
  finally:
      try:
          con.close()  # releases recources, doesn't actually close sqlite3* handle
      except:
          print "Unexpected error closing cursor: %s", sys.exc_info()[0]
$$ language python;

This implementation for Python stored procs is accomplished by creating a Python interpreter instance; then taking the current connection, (of type sqlite3*) and sticking this pointer value into a Python global variable named “sqlite3_db_handle”, so that the modified pysqlite2 module can access it when it instantiates a “dbapi2.connection” with zero arguments. (Normally, at least one argument is required) The embedded interpreter is provided via a SQLite extension (loadable shared library). This is a scalable approach since the core code does not need to be recompiled for each new language implementation, assuming certain data structures are implemented by the extension.

Building:

On MacOSX Snow Leopard (osx 10.6.x), make sure gcc-4.0 is used.

build sqlite3 and libsqlite3:

The “runconfigure.sh” script is just a one-liner:

../configure --enable-load-extension --enable-debug –enable-stored-procedures

Note that the original, unaltered code is in the “master” branch, while the code with the stored procs changes is on the “sproc-1” branch.

$ git clone git://github.com/wolfch/sqlite-3.7.3.p1
$ cd sqlite-3.7.3
$ git checkout sproc-1
$ mkdir ./build
$ cp runconfigure.sh ./build
$ cd ./build
$ ./runconfigure.sh
$ export LIBS=-ldl

(the “LIBS” environment setting only required on Linux, not MacOSX – and I have not tried building on Windoze)

$ make

Build pysqlite2

$ cd ../../
$ git clone git://github.com/wolfch/pysqlite-2.6.0
$ cd pysqlite-2.6.0
$ git checkout sproc-1
$ sudo python setup.py install

Note that the original, unaltered code is in the “master” branch, while the code with the stored procs changes is on the “sproc-1” branch.

Build python stored proc extension

On Linux, if you don’t have the python-dev package:

$ sudo apt-get install python2.6-dev

On MacOSX, it comes with Python-2.4 and the dev libraries and headers already.

$ cd sqlite-3.7.3/ext/pyproc
$ make -f Makefile.darwin  (on Mac)
$ make -f Makefile.linux   (on Linux)

Generating code documentation using “doxygen”

(see http://www.doxygen.org)

$ cd sqlite-3.7.3
$ doxygen

[lots of output… the directory “codegen” is created and populated]

Open: codedoc/html/index.html

See also:

sqlite-3.7.3/Doxfile

https://github.com/wolfch/sqlite-3.7.3.p1

https://github.com/wolfch/pysqlite-2.6.0

Posted by Admin on Wednesday, January 26, 2011