SQLite CVSTrac

idxchk (index check) is a Tcl program that analyzes a SELECT (or other) SQL statement and prints which indexes are used.

Download idxchk from the contrib section: http://sqlite.org/contrib

idxchk uses the output from the EXPLAIN command, along with table and index meta data stored in the sqlite_master table. idxchk works with 3.0.8 through 3.2.7, probably other versions. SQLite opcodes are considered private and can change from version to version, so be warned, that idxchk may not work correctly for other versions.

Oct 20, 2005 - idxchk still works with version 3.2.7, and still provides a little more information than EXPLAIN QUERY PLAN. Most notably, idxchk will report when primary key indexes or rowids are being used in a query. If this reporting can be rolled into EXPLAIN QUERY PLAN, then idxchk can be retired. Also, newer index optimizations in SQLite have rendered some of the examples out of date, see below. -Tom Poindexter

Run idxchk as:

    [ tclsh ] idxchk  [ -v ]  dbname  [ sqlstmt ...]

If no sqlstmts are specified on the command line, idxchk will read from stdin for SQL statements. Multiple statements can be entered, either on the command line or on stdin. All but the last statement is executed, in case you need to attach other databases or create temporary tables. The last statement is analyzed to report index usage. Statements in stdin should be separated by semi-colons. Multiple statements on the command line are executed separately, without need to append a semi-colon. SQL statements specified on the command line must be enclosed in proper shell quoting (single quotes or double quotes).

Adding the -v option will print additional information, including the databases accessed, likely table nesting order, explain opcodes, and internal cursor usage.

idxchk runs on Linux, Unix, and Windows Since idxchk is written in Tcl, you'll need a Tcl interpreter and a SQLite/Tcl binding (e.g. Tcl extension.)

If you don't have Tcl installed, I recommend using TclKit from http://www.equi4.com/pub/tk/8.4.9/. Windows users will want to download the command line shell version of Tclkit (tclkitsh-win32.exe.gz). SQLite Tcl bindings can be downloaded from SQLite's download page for Windows and Linux. For other Unix platforms, you'll need to first compile and install Tcl ( http://sf.net/projects/tcl ), then compile SQLite to produce the Tcl binding.

Alternatively, if you don't want to compile Tcl, you can download ActiveTcl from ActiveState http://activestate.com/Products/ActiveTcl/ for several popular platforms. You may need to edit the third line of idxchk in order to use the correct Tcl intepreter (e.g. tclkit or a different version of tclsh.) Also, change the permissions on the file as chmod +x idxchk. Or, simply specify the Tcl interpreter on the command line before idxchk (E.g., tclsh, tclsh8.4, or tclkit).

idxchk will attempt to load libtclsqlite3.so, tclsqlite3.so, or tclsqlite3.dll from the current directory, and finally via package require sqlite3. Note that you must use the same version of SQLite's Tcl bindings as which form of SQLite you use for production. Otherwise, changes in opcode generation may report erroneous results between idxchk and your actual usage.

See the PerformanceTuning page for information on how JOINs work in SQLite. You might want to also review SQLite internals. The slides at http://www.sqlite.org/php2004/page-001.html are definitive for SQLite 3.x.

idxchk was written by Tom Poindexter, and in the spirit of SQLite, placed in the public domain.

Example

To illustrate idxchk, lets define and populate some tables. Let's organize a few books from my bookshelf:

   $ sqlite3 test.db

   -- authors, just last names to keep it simple.

   create table author (
       author_id      integer primary key,
       name           char(100)
   );
   insert into author values (1, 'Kernighan');
   insert into author values (2, 'Ritchie');
   insert into author values (3, 'Pike');
   insert into author values (4, 'Harrison');
   insert into author values (5, 'McLennan');
   insert into author values (6, 'Ousterhout');
   insert into author values (7, 'Welch');
   insert into author values (8, 'Stevens');
   insert into author values (9, 'Aho');
   insert into author values (10, 'Weinberger');

   -- books, just the title for this example.

   create table book (
       book_id        integer primary key,
       title          char(100)
   );
   insert into book values (1, 'The C Programming Language');
   insert into book values (2, 'The Practice of Programming');
   insert into book values (3, 'Effective Tcl/Tk Programming');
   insert into book values (4, 'Tcl/Tk Tools');
   insert into book values (5, 'Practical Programming in Tcl and Tk');
   insert into book values (6, 'Tcl and the Tk Toolkit');
   insert into book values (7, 'Unix Network Programming');
   insert into book values (8, 'The AWK Programming Language');

   -- a 'writing' is an author writing or contributing to a book.
   -- this is a many-to-many table, allowing a book to have
   -- many authors, or a single author contributing to many books

   create table writing (
       author_id      integer,
       book_id        integer
   );
   insert into writing values (1, 1);
   insert into writing values (2, 1);
   insert into writing values (1, 2);
   insert into writing values (3, 2);
   insert into writing values (4, 3);
   insert into writing values (5, 3);
   insert into writing values (4, 4);
   insert into writing values (7, 5);
   insert into writing values (6, 6);
   insert into writing values (8, 7);
   insert into writing values (1, 8);
   insert into writing values (9, 8);
   insert into writing values (10, 8);

   ^D  (control-D) end of file, Windows users press ^Z (control-Z)

No indexes have yet been defined, but since we have defined the two base tables having an INTEGER PRIMARY KEY, SQLite should use the id column as the table's B-tree index.

Lets get a listing of books and authors:

    $ sqlite3 test.db

    select name, title from author a, book b, writing w
     where w.book_id   = b.book_id
       and w.author_id = a.author_id ;

^D

Since our many-to-many table does not have an index on either foreign key, no indexes will be used, not even the B-Tree index for the base tables:

    $ idxchk test.db

    select name, title from author a, book b, writing w
     where w.book_id   = b.book_id
       and w.author_id = a.author_id

^D

    table   index(es)  column(s)
    ------- ---------- ----------
    author  (none)
    book    (none)
    writing (none)

No indexes are being used in this query, this query will degrade in performance when the number of rows increases. SQLite's method for JOINs loops over the first table in the query. Since we will be traversing the entire writing table, let's make it the first table.

    $ idxchk test.db

    select name, title from writing w, book b, author a
     where a.author_id = w.author_id
       and b.book_id   = w.book_id

^D

    table   index(es)  column(s)
    ------- ---------- ------------------------------
    writing (none)
    book    <pk>       <integer primary key or rowid>
    author  <pk>       <integer primary key or rowid>

Much better!! Each row of the writing table is read sequentially, but we can use the primary key of the two base tables for lookup. Since we defined the book_id and author_id columns as INTEGER PRIMARY KEY, SQLite can use that id as an index.

Suppose we want to select books written by Brian Kernighan. We will also use SQL JOIN syntax. SQLite treats the JOIN syntax exactly the same as specifing multiple tables and specifing the join criteria in the WHERE clause. I personally like the JOIN syntax, as it separates the join criteria from other selection criteria:

    $ idxchk test.db

    select title from writing w
      inner join author a on a.author_id = w.author_id
        inner join book b on w.book_id = b.book_id
    where a.name = 'Kernighan'

^D

    table   index(es)  column(s)
    ------- ---------- ------------------------------
    writing (none)
    author  <pk>       <integer primary key or rowid>
    book    <pk>       <integer primary key or rowid>

Author name is a candidate for indexing, add an index:

    $ sqlite3 test.db

    create index au_name_idx on author (name);

    ^D
    $ idxchk test.db

    select title from writing w
      inner join author a on a.author_id = w.author_id
        inner join book b on w.book_id = b.book_id
    where a.name = 'Kernighan'

^D

    table   index(es)  column(s)
    ------- ---------- ------------------------------
    writing (none)
    author  <pk>       <integer primary key or rowid>
    book    <pk>       <integer primary key or rowid>

What? No improvement. Remember that SQLite always loops over every row in the first table in the query.

Note that recent query plan optimizations in SQLite 3.2.7, will allow the index au_name_idx to be used. The result of running idxchk with 3.2.7 on this query will correctly report that the index is used in this query.

What we might want to do is first create a temporary table of only writings by Kernighan; then use that temp table in a second query. idxchk will execute all statements in batch, and only analyze the last statement, so we can execute the CREATE TEMP TABLE as part of the batch. Let's first check to make sure that the au_name_idx is being used by running it separately:

    $ idxchk test.db

    create temp table bk_writing as
    select book_id from writing where author_id in (
        select author_id from author where name = 'Kernighan'
    )

^D

    table   index(es)   column(s)
    ------- ----------- ------------------------------
    author  <pk>        <integer primary key or rowid>
    author  au_name_idx (name)
    writing (none)

Here's the full query to idxchk, note that we use semi-colons to separate SQL statements:

    $ idxchk test.db

    create temp table bk_writing as
    select book_id from writing where author_id in (
        select author_id from author where name = 'Kernighan'
    )

    select title from bk_writing bk
      inner join book b on b.book_id = bk.book_id

^D

    table      index(es)  column(s)
    ---------- ---------- ------------------------------
    bk_writing (none)
    book       <pk>       <integer primary key or rowid>

Selecting into the temp table first will use the index on author.name, then use the primary key on the book table to select the actual book. I use this pattern quite often in more complex queries:

Select keys and other select columns into TEMP table for rows of interest from a base table. Make sure the selection criteria appropriate indexes.
Join the temp table to other tables for final select. Columns that are joined should have appropriate indexes.

Another way to speed up this query is to add another index on the writing table for author_id. This will allow SQLite to perform the query without need for a temp table:

    $ idxchk test.db

    create index writing_au_idx on writing (author_id)
    ;

    select title from author a
     inner join writing w on a.author_id = w.author_id
      inner join book b on w.book_id = b.book_id
      where a.name = 'Kernighan'

^D

    table   index(es)      column(s)
    ------- -------------- ------------------------------
    author  <pk>           <integer primary key or rowid>
    author  au_name_idx    (name)
    writing <pk>           <integer primary key or rowid>
    writing writing_au_idx (author_id)
    book    <pk>           <integer primary key or rowid>

Note that the query is reordered, the table author is specified first. In this case, SQLite can use an index without opening a table. idxchk prints all accessed tables first, then any other index usage.

Some other considerations:

SQLite is very fast, but not very forgiving of poorly coded JOINs, or lack of proper indexes.
Considering increasing the size of the page cache with PRAGMA cache_size=_nn_. This may even help poorly form JOINs to perform faster.
Usage patterns should be considered, i.e., the cost of maintaining an index for INSERT, UPDATE, and DELETE versus the speed advantage for SELECT. Note that too many indexes, or indexes that never get used during a SELECT, are just as bad as not having enough indexes.

sqlite - Idx Chk