Given a single SQL statement, there might be dozens, hundreds, or even thousands of ways to implement that statement, depending on the complexity of the statement itself and of the underlying database schema. The task of the query planner is to select the algorithm that minimizes disk I/O and CPU overhead.

Additional background information is available in the indexing tutorial document. The Next Generation Query Planner document provides more detail on how the join order is chosen.

2. WHERE Clause Analysis

Prior to analysis, the following transformations are made to shift all join constraints into the WHERE clause:

All NATURAL joins are converted into joins with a USING clause.
All USING clauses (including ones created by the previous step) are converted into equivalent ON clauses.
All ON clauses (include ones created by the previous step) are added as new conjuncts (AND-connected terms) in the WHERE clause.

SQLite makes no distinction between join constraints that occur in the WHERE clause and constraints in the ON clause of an inner join, since that distinction does not affect the outcome. However, there is a difference between ON clause constraints and WHERE clause constraints for outer joins. Therefore, when SQLite moves an ON clause constraint from an outer join over to the WHERE clause it adds special tags to the Abstract Syntax Tree (AST) to indicate that the constraint came from an outer join and from which outer join it came. There is no way to add those tags in pure SQL text. Hence, the SQL input must use ON clauses on outer joins. But in the internal AST, all constraints are part of the WHERE clause, because having everything in one place simplifies processing.

After all constraints have been shifted into the WHERE clause, The WHERE clause is broken up into conjuncts (hereafter called "terms"). In other words, the WHERE clause is broken up into pieces separated from the others by an AND operator. If the WHERE clause is composed of constraints separated by the OR operator (disjuncts) then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.

All terms of the WHERE clause are analyzed to see if they can be satisfied using indexes. To be usable by an index a term must usually be of one of the following forms:


  column = expression
  column IS expression
  column > expression
  column >= expression
  column < expression
  column <= expression
  expression = column
  expression IS column
  expression > column
  expression >= column
  expression < column
  expression <= column
  column IN (expression-list)
  column IN (subquery)
  column IS NULL
  column LIKE pattern
  column GLOB pattern

If an index is created using a statement like this:

CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);

Then the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms. The initial columns of the index must be used with the = or IN or IS operators. The right-most column that is used can employ inequalities. For the right-most column of an index that is used, there can be up to two inequalities that must sandwich the allowed values of the column between two extremes.

It is not necessary for every column of an index to appear in a WHERE clause term in order for that index to be used. However, there cannot be gaps in the columns of the index that are used. Thus for the example index above, if there is no WHERE clause term that constrains column c, then terms that constrain columns a and b can be used with the index but not terms that constrain columns d through z. Similarly, index columns will not normally be used (for indexing purposes) if they are to the right of a column that is constrained only by inequalities. (See the skip-scan optimization below for the exception.)

In the case of indexes on expressions, whenever the word "column" is used in the foregoing text, one can substitute "indexed expression" (meaning a copy of the expression that appears in the CREATE INDEX statement) and everything will work the same.