*** DRAFT ***
The Lemon LALR(1) Parser Generator
Table Of Contents

1. Overview

The SQL language parser for SQLite is generated using a code-generator program called "Lemon". The Lemon program reads a grammar of the input language and emits C-code to implement a parser for that language.

1.1. Lemon Source Files And Documentation

Lemon does not have its own source repository. Rather, Lemon consists of a few files in the SQLite source tree:

2. Advantages of Lemon

Lemon generates an LALR(1) parser. Its operation is similar to the more familiar tools Yacc and Bison, but Lemon adds important improvements, including:

2.1. Use of Lemon Within SQLite

Lemon is used in two places in SQLite.

The primary use of Lemon is to create the SQL language parser. A grammar file (parse.y) is compiled by Lemon into parse.c and parse.h. The parse.c file is incorporated into the amalgamation without further modification.

Lemon is also used to generate the parser for the query pattern expressions in the FTS5 extension. In this case, the input grammar file is fts5parse.y.

2.2. Lemon Customizations Especially For SQLite

One of the advantages of hosting code generator tools as part of the project is that the tools can be optimized to serve specific needs of the overall project. Lemon has benefited from this effect. Over the years, the Lemon parser generator has been extended and enhanced to provide new capabilities and improved performance to SQLite. A few of the specific enhancements to Lemon that are specifically designed for use by SQLite include:

The parsing of SQL statements is a significant consumer of CPU cycles in any SQL database engine. On-going efforts to optimize SQLite have caused the developers to spend a lot of time tweaking Lemon to generate faster parsers. These efforts have benefited all users of the Lemon parser generator, not just SQLite. But if Lemon had been a separately maintained tool, it would have been more difficult to make coordinated changes to both SQLite and Lemon, and as a result not as much optimization would have been accomplished. Hence, the fact that the parser generator tool is included in the source tree for SQLite has turned out to be a net benefit for both the tool itself and for SQLite.

3. History Of Lemon

Lemon was originally written by D. Richard Hipp (also the creator of SQLite) while he was in graduate school at Duke University between 1987 and 1992. The original creation date of Lemon has been lost, but was probably sometime around 1990. Lemon generates an LALR(1) parser. There was a companion LL(1) parser generator tool named "Lime", but the source code for Lime has been lost.

The Lemon source code was originally written as separate source files, and only later merged into a single "lemon.c" source file.

The author of Lemon and SQLite (Hipp) reports that his C programming skills were greatly enhanced by studying John Ousterhout's original source code to Tcl. Hipp discovered and studied Tcl in 1993. Lemon was written before then, and SQLite afterwards. There is a clear difference in the coding styles of these two products, with SQLite seeming to be cleaner, more readable, and easier to maintain.