Performance Tuning

Any questions about the meaning of the content of this page can be sent to vbsqliteNOSPAM@NOSPAMag-software.com (just remove the NOSPAM)

This page is an area where you can place code, hints, suggestion or text that relates to increasing the speed of the SQLite databases.

Place indexes on join and where column

Placing indexes on columns that are used in joins or where statements are the best way to increase query speed.

Use cluster indexes

Clustered indexes are indexes that comprise more than one column. These increase the speed of queries where you have multiple joins of one table or you are selecting rows based on multiple columns. (This is true in other DB systems, I don't know if it holds true in SQLite.)

How SQLite works out JOINs

From an email by D. Richard Hipp on 13 January 2004:

SQLite implements JOIN USING by translating the USING clausing into some extra WHERE clause terms. It does the same with NATURAL JOIN and JOIN ON. So while those constructs might be helpful to the human reader, they don't really make any difference to SQLite's query optimizer.

SQLite converts joins into where statements. This requires CPU time to perform the conversion, however small that CPU time might be. If you are calling a lot of SQL statements which are returning small amount of rows then it might reduce the overall time (CPU cost) if you manually remove the JOIN and replace them with where statements.
While savings on this might be small in tight loops of 1000's or in low memory situations it could be worth the coding. When SQLite supports a prepare/precompile interface then precompiling the SQL Statements will be the better way to go.

Order your tables properly in your where clause

Place the tables where you can eliminate the most rows by using a where clause (preferably on an indexed column) first, in order to limit the number of JOIN operations required.

The following is from a message posted by D. Richard Hipp to the SQLite mailing list regarding join translations:

When SQLite sees this:

    SELECT * FROM a JOIN b ON a.x=b.y;

It translate it into the following before compiling it:

    SELECT * FROM a, b WHERE a.x=b.y;

Neither form is more efficient that the other. Both will generate identical code. (There are subtle differences on an LEFT OUTER JOIN, but those details can be ignored when you are looking at things at a high level, as we are.)

SQLite implements joins using nested loops with the outer loop formed by the first table in the join and the inner loop formed by the last table in the join. So for the example above you would have:

    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

If you reverse the order of the tables in the FROM clause like this:

    SELECT * FROM b, a WHERE a.x=b.y;

You should get an equivalent result on output, but SQLite will implement the query differently. Specifically it does this:

    For each row in b:
      For each row in a such that a.x=b.y:
        Return the row

The trick is that you want to arrange the order of tables so that the "such that" clause on the inner loop is able to use an index to jump right to the appropriate row instead of having to do a full table scan. Suppose, for example, that you have an index on a(x) but not on b(y). Then if you do this:

    SELECT * FROM a, b WHERE a.x=b.y;
    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

For each row in a, you have to do a full scan of table b. So the time complexity will be O(N^2). But if you reverse the order of the tables in the FROM clause, like this:

    SELECT * FROM b, a WHERE b.y=a.x;
    For each row in b:
      For each row in a such that a.x=b.y
        Return the row

Now the inner loop is able to use an index to jump directly to the rows in a that it needs and does not need to do a full scan of the table. The time complexity drops to O(NlogN).

So the rule should be: For every table other than the first, make sure there is a term in the WHERE clause (or the ON or USING clause if that is your preference) that lets the search jump directly to the relavant rows in that table based on the results from tables to the left.

Other database engines with more complex query optimizers will typically attempt to reorder the tables in the FROM clause in order to give you the best result. SQLite is more simple-minded - it codes whatever you tell it to code.

Before you ask, I'll point out that it makes no different whether you say "a.x=b.y" or "b.y=a.x". They are equivalent. All of the following generate the same code:

      ON a.x=b.y
      ON b.y=a.x
      WHERE a.x=b.y
      WHERE b.y=a.x

Indexes on INTEGER PRIMARY KEY columns (don't do it)

When you create a column with INTEGER PRIMARY KEY, SQLite uses this column as the key for (index to) the table structure. This is a hidden index (as it isn't displayed in SQLite_Master table) on this column. Adding another index on the column is not needed and will never be used. In addition it will slow INSERT, DELETE and UPDATE operations down.

Use transactions when updating tables

Make sure that you wrap up all multiple updates inside a transaction, e.g.:

BEGIN TRANSACTION;
UPDATE table1 SET col1='1';
UPDATE table1 SET col1='2';
...
INSERT INTO table1(col1) VALUES ('2');
...
COMMIT TRANSACTION;

(The word "TRANSACTION" is optional.)

Using a transaction is the fastest way to update data in SQLite. Basically, this is how it works: After each transaction the SQLite engine closes and opens the database file. When SQLite opens a database file it populates the SQlite internal structures, which takes time. So if you have 100 updates and don't use a transaction then SQlite will open and close the database 100 times. Using transactions improves speed. Use them.

How to get SQLite to use multiple indexes

SQLite can use multiple indexes on a query. You just have to tell it to explicitly by restructuring your SQL.

As an example, consider this query:

     SELECT * FROM table1 WHERE a=5 AND b=11;

Suppose there are two indices:

     CREATE INDEX index1 ON table1(a);
     CREATE INDEX index2 ON table1(b);

As written, SQLite will only use one of these two indices to perform the query. The choice is arbitrary (unless you have run ANALYZE and SQLite has some information to help it pick the "best" index.)

If you want to use both indices, rewrite the query this way:

     SELECT * FROM table1 WHERE rowid IN
         (SELECT rowid FROM table1 WHERE a=5
           INTERSECT SELECT rowid FROM table1 WHERE b=11);

The optimizer in PostgreSQL will make this change for you automatically and will use a bitmap to implement the IN operator and the INTERSECT. With SQLite, though, you have to type in the expanded version yourself. And because rowids in SQLite are user visible and changeable and can thus be diffuse, SQLite is unable to use bitmaps to optimize the computation. But modulo the bitmap optimization, SQLite gives you all the capabilities of PostgreSQL, you just have to type it in yourself rather than letting the optimizer do it for you.

A SELECT which only accesses the index of a table, not the table row data

Checked only with version 2.8.18! When you only want to check if something exists in a table which has an index(e.g. an index on "hour"), you might do this like so:

     SELECT hour FROM appointments WHERE hour=5 LIMIT 1;
     - or worse -
     SELECT * FROM appointments WHERE hour=5 LIMIT 1;
But both first check the index and then get the value of "hour" from the table row, not from the index itself. So what is quicker is simply this:
     SELECT 1 FROM appointments WHERE hour=5 LIMIT 1;
Now SQlite doesn't have to read the value and just puts the number 1 in the result row so you know there are appointments at 5. This is mostly usable when you for example give an overview of a day and have to indicate on what hours there are appointments. Personally I used this on a not so powerful embedded system, so it will be not so noticable on a PC platform of course.

Prepared statements must be generated inside your transaction

Entered by Nathan P Sharp, Jan 2008

While trying to improve the performance of bulk imports in our C++ project, we found that creating the prepared statements was a large hit. Creating them once at the construction of our class, though, made the problem worse! It turns out that prepared statements that are generated before the transaction start do not work with the transaction. The fix was simply to create new prepared statements once per transaction.