Every SQL database engine works in roughly the same way: It first translates the input SQL text into a "prepared statement". Then it "executes" the prepared statement to generate a result.
A prepared statement is an object that represents the steps needed to accomplish the input SQL. Or, to think of it in another way, the prepared statement is the SQL statement translated into a form that is more easily understood by the computer.
In SQLite, a prepared statement is an instance of the sqlite3_stmt object. In other systems, the prepared statement is usually an internal data structure that is not directly visible to the application programmer. Developers of other SQL database engines do not necessarily call these objects "prepared statements". But such objects exists, whatever they might be called. This paper will use the term "prepared statement".
There are countless ways of implementing a prepared statement. This paper will look at the two most common methods:
Bytecode → The input SQL is translated into a virtual machine language that is then run by a virtual machine interpreter. This is the technique used by SQLite.
Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.
There are advantages and disadvantages to each of these representations of a prepared statement. The purpose of this paper is to articulate some of those advantages and disadvantages.
This document is written from the perspective of the original author of SQLite. If you disagree with any of the opinions offered in this document, you are welcomed to offer corrections and/or contrary views on the SQLite Forum. Or you can email the author directly.
The bytecode generated by SQLite might be a little different from what many readers think of as bytecode. The bytecode used (for example) by the Java virtual machine or by WebAssembly consists almost entirely of low-level operations, similar to what physical CPUs implement: basic math operators, comparisons, conditional jumps, and instructions to move content between different memory locations. SQLite bytecode has these kinds of low-level instructions, too. But SQLite bytecode also contains some high-level operations that are specific to the needs of a database engine. Here are just a few examples:
OP_Column → Extract the value from the N-th column of the database row that a particular cursor is currently pointing at.
OP_CreateBtree → Allocate space for a new B-Tree in the database file.
OP_ParseSchema → Reread and reparse all or part of the sqlite_schema table and refresh internal symbol tables accordingly.
OP_SeekGE → Move a cursor on a particular B-Tree to the first entry that is greater than or equal to a given key.
OP_Next → Advance a cursor on a particular B-Tree to the next entry in the B-Tree and jump, or fall through if there are no more entries in that B-Tree.
In other words, the "bytecode" used by SQLite is not so much a set of CPU instructions as it is a list of database primitives that are to be run in a particular order.
An "Abstract Syntax Tree" or AST is a data structure that describes a program or statement in some kind of formal language. In our context, the formal language is SQL. An AST is typically implemented as a tree of objects where each object represents one small part of the overall SQL statement. ASTs emerge naturally from parsers for formal languages. The usual technique is to use an LALR(1) parser. With such a parser, each terminal symbol holds metadata that will become a leaf of the AST, and each non-terminal symbol holds metadata that will become a sub-branch of the overall AST. As rules of the grammar are "reduced" by the parser, new nodes of the AST are allocated and connected to subnodes. After the parse completes, the start-symbol of the grammar is left holding the root of the AST.
An AST is a tree of objects. But an AST is not a suitable form for a prepared statement. After being generated, an AST first needs to be transformed in various ways before it can executed. Symbols need to be resolved. Semantic rules need to be checked. Optimizations need to be applied that transform input SQL statement into different forms that execute more quickly. Finally, the AST needs to be translated into an alternative representation that is more amenable to execution.
Some people refer to the tree of objects that is used as the executable form for MySQL and PostgreSQL as an AST. This is probably a misuse of the term "AST", because by the time the tree of objects is ready to be executed, it has been changed so much that it has little resemblance to the original SQL text. The confusion arises in part because both the final prepared statement object and the original AST are both trees of objects. The usual technique is for the original AST that comes directly out of the parser to be transformed little by little, in multiple passes, until at the end it is fully converted into an tree of objects that is no longer strictly an AST but that can be evaluated to generate a result. There is not necessarily a clear point during this process when the tree-of-objects ceases to be an AST and becomes a prepared statement instead. And because there is no clear boundary between an AST and a prepared statement, people often refer to a prepared statement that is represented as a tree of objects as an "AST", even though that description is not precise.
Dataflow programming is a style of programming in which individual nodes specialize in doing one small part of the overall computation. Each node receives inputs from other nodes and sends its output to other nodes. Thus the nodes form a directed graph that carry inputs into outputs.
A "dataflow program" is perhaps a better description than "AST" for the tree of objects that an SQL database engine uses as a prepared statement.
SQLite compiles to bytecode, and the SQLite developers are very happy with this approach. Here is why:
A flat list of opcodes can be easily printed to see exactly how an SQL statement is being implemented. This is what happens in SQLite when you preface an SQL statement with the "EXPLAIN" keyword: Instead of actually running the SQL, the result is a listing of the bytecode that would have been used to implement that SQL.
Bytecode lends itself to this because a bytecode program is easily represented as a table. In SQLite bytecode, each instruction has one opcode and five operands. Thus a prepared statement can be rendered as if it were a query against a six-column table.
A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.
Bytecode provides a clear separation between front-end parsing and analysis and back-end evaluation of an SQL statement. When problems arise (incorrect answers and/or poor performance), the developers can examine the bytecode to quickly determine if the source of the trouble is either the front-end analysis or the back-end data storage section of the product.
In debugging builds of SQLite, the PRAGMA vdbe_trace=ON; command will cause a trace of the bytecode execution to appear on the console.
SQL statements written in bytecode can be evaluated incrementally. For example, a statement can be run until it generates just its first row of output. The statement then pauses until it is stepped again. It is not necessary to run the statement to completion before examining the first row of output.
This is more difficult to achieve in a tree-of-objects design. When the prepared statement is a tree-of-objects, execution is normally accomplished by walking the tree. To pause the statement in the middle of a computation means unwinding the stack back up to the caller, all the while saving enough state to resume evaluation where it last left off. This is not impossible to do, but is sufficiently difficult that I have never seen it actually done.
Most SQL database engines do not really need to do incremental execution of prepared statements because most SQL database engines are client/server. In client/server engines, a single SQL statement is sent to the server, then the complete reply comes back over the wire all at once. Thus each statement runs to completion in a single go. But SQLite is not client/server. SQLite is a library that runs in the same address space and using the same stack as the application. Being able to easily and reliably perform incremental execution of an SQL statement is important to SQLite.
The bytecode generated by SQLite is usually smaller than the corresponding AST coming out of the parser. During initial processing of SQL text (during the call to sqlite3_prepare() and similar) both the AST and the bytecode exist in memory at the same time, so more memory is used then. But that is a transient state. The AST is quickly discarded and its memory recycled, even before the call to sqlite3_prepare() returns, so the resulting prepared statement ends up consuming less memory in its bytecode representation than it did as an AST. This is important because calls to sqlite3_prepare() are transient, but prepared statements are often cached for possible reuse and persist in memory for a long time.
I believe that a bytecode representation of a prepared statement runs faster, because fewer decisions need to be made for each step of the computation. Emphasis on "believe" in the previous sentence → it is difficult to verify this claim experimentally since nobody has ever put in the multiple years of effort necessary to generate equivalent bytecode and tree-of-object representations of a prepared statement to see which one actually runs faster. We do know that SQLite is very fast, but we do not have good side-by-side comparisons with other SQL databases since the other databases spend a lot of time doing client/server message processing, and it is difficult to untangle the message round-trip overhead from the actual processing time.
The SQLite developers think that the bytecode approach is best, at least for the use cases the SQLite tries to fill, but the tree-of-objects approach to processing SQL does have some advantages over bytecode. There are always tradeoffs.
When a prepared statement is bytecode, once the bytecode has been generated, the algorithm is fixed and cannot be subsequently changed without completely rewriting the bytecode. This is not the case with a tree-of-objects prepared statement. A tree-of-objects is easier to modify on-the-fly. The query plan is mutable and can be tweaked as it is running, based on the progress of the query. Thus a query can be dynamically self-tuning.
In a dataflow program, each processing node can be assigned to a different thread. There needs to be some kind of threadsafe queuing mechanism for transferring intermediate results from one node to the next. But no synchronization primitives are typically needed within each node of the program. Node schedule is trivial: A node becomes eligible to run when it has data available and there is space in its output queue.
This is an important consideration for database engines that are designed to run large analytic queries (OLAP) on large multi-core servers. The primary focus of SQLite is transaction processing (OLTP) on the internet-of-things, so there is less need to represent prepared statements as dataflow programs in SQLite.
This page last modified on 2024-05-09 17:38:03 UTC
*** DRAFT ***