SQLite is implemented in ANSI-C. But many of the C code files that are input to the C compiler are generated from other scripts and programs rather than being typed in manually. The diagram to the right shows the complete build process.
In the image above, red ovals are original source files from the configuration management system. Green ovals are C code that is automatically generated. Blue rectangles are build tools and compilers that are needed on the host platform in order to compile SQLite. Yellow rectangles are build tools and compilers for which the source code is part of the SQLite source tree. The final output (the SQLite library) is a purple oval near the bottom of the diagram.
The files contained within the light-blue bubble are the C code files that become part of the SQLite library. You will notice that some files from CVS are within the blue bubble and others are not. Not every code file in the CVS repository ends up being part of the SQLite library. On the download page, the downloads with names of the form sqlite-X.X.X.tar.gz are snapshots of the CVS tree. These are the red ovals. The downloads with names of the form sqlite-source-X_X_X.zip contain just the files inside the light-blue bubble.
These are the generated C code files:
- keywordhash.h. This file contains C code to implement a hash table of all of the SQL keywords that SQLite understands. Do not be misled by the ".h" suffix - this file contains actual code, not just declarations. The reason for using ".h" instead of ".c" is that the file is #include-ed into the middle of tokenize.c. The keywordhash.h code file is generated by a custom C program named mkkeywordhash.c. We might have just as easily have hand-coded the keyword hash table, and in fact that was done in earlier versions of SQLite. But the hash table that mkkeywordhash.c is optimized for both speed and size. It saves about 2K of code space. And when you are trying to build an SQL database engine that will fit on embedded devices, every little bit of code space helps.
- sqlite3.h. This is the header file that defines the programmer API for SQLite. This is mostly just a copy of the sqlite.h.in file from CVS with current library version number from the file named VERSION inserted in strategic places.
- parse.c and parse.h. These files implement the SQL parser for SQLite. The input grammar is in a source file named parse.y. This parse.y file is converted into C code by the Lemon parser generator. The source code to Lemon is part of the SQLite source tree. The lemon.c file is compiled to generate the lemon executable. Then the lemon executable is run with parse.y as its input to generate the output files. The lempar.c file is a template used by Lemon to generate its output files.
- opcodes.h. This file contains #defines that map opcode names into opcode numbers for the Virtual Database Engine (VDBE) in the core of SQLite. It is generated by an AWK script that uses both the parse.h file generated by Lemon and the vdbe.c source file from CVS as inputs. The vdbe.c file is the implementation of the virtual machine. It is scanned to figure out which opcodes are needed. The parse.h file is used because for efficiency reasons we want to make some of the VDBE opcodes have the same numeric value as token codes in the parser. For example, the token code for the "+" operator is the same as the addition opcode in the VDBE. Arranging things this way makes code generation much easier.
- opcodes.c. This file maps VDBE opcode numbers back into symbolic names so that symbolic opcode names (rather than obscure opcode numbers) can appear in the output of EXPLAIN.
Generating the processed C code can be a little bit tricky. Note the dependency trace from parse.h to opcodes.h to opcodes.c. You have to be careful to do things in the right order. Fortunately, the makefiles do this for you automatically.
Note: there is a makefile target that will generate just the processed C code and stop. If you type
make target_source
Then the makefiles will construct a subdirectory named "tsrc" and put copies of the processed C code into that directory. That is how the sqlite-source-X_X_X.zip downloads are generated: we just run the target_source make target and ZIP up the "tsrc" subdirectory.
After all of the processed C code has been prepared as shown above, the SQLite library is generated simply by passing the processed C code into an ordinary C compiler.
Building The Amalgamation
Beginning with version 3.3.14, SQLite is available in the form of a single huge file that contains all of the C code for SQLite. We call this single source file "the amalgamation". The diagram to the right shows how the amalgamation is built.
Very little has changed from the previous diagram. The processed C code in the light-blue bubble is the same and all the steps needed to generate that code are the same. The only difference is in what we do with the processed C code.
To generate the amalgamation, there is a Tcl script named mksqlite3c.tcl that reads the processed C code and copies it all into the amalgamation file, "sqlite3.c", in the right order. The mksqlite3c.tcl script has to take care to replace #includes of internal header files with the actual content of those headers, and to make sure that headers are not included more than once. And it has to add the sources in just the right order. So building the amalgamation is more than just concatenating the files together. But it is not a lot more.
Beginning with version 3.3.15, there is a makefile target that will automatically build the amalgamation. Type:
make sqlite3.c
And the makefile will automatically construct the processed C code then run mksqlite3c.tcl for you.
Attachments:
- make-lib.gif 35363 bytes added by drh on 2007-Apr-07 13:34:31 UTC.
Diagram of SQLite library build process
- make-amal.gif 37905 bytes added by drh on 2007-Apr-07 13:35:16 UTC.
Diagram of the process for constructed the source code amalgamation