*** 27,32 **** --- 27,133 ---- <p>Place the tables where you can eliminate the most rows by using a where clause (preferably on an indexed column) first, in order to limit the number of JOIN operations required.</p> + <p>The following is from a message posted by D. Richard Hipp to the SQLite mailing list regarding join translations:</p> + + <p>When SQLite sees this:</p> + + <TT><pre> + SELECT * FROM a JOIN b ON a.x=b.y; + </pre></TT> + + <p>It translate it into the following before compiling it:</p> + + <TT><pre> + SELECT * FROM a, b WHERE a.x=b.y; + </pre></TT> + + <p>Neither form is more efficient that the other. Both will generate + identical code. (There are subtle differences on an LEFT OUTER JOIN, + but those details can be ignored when you are looking at things at a + high level, as we are.)</p> + + <p>SQLite implements joins using nested loops with the outer + loop formed by the first table in the join and the inner loop formed by + the last table in the join. So for the example above you would have:</p> + + <TT><pre> + For each row in a: + For each row in b such that b.y=a.x: + Return the row + </pre></TT> + + <p>If you reverse the order of the tables in the FROM clause like + this:</p> + + <TT><pre> + SELECT * FROM b, a WHERE a.x=b.y; + </pre></TT> + + <p>You should get an equivalent result on output, but SQLite will implement + the query differently. Specifically it does this:</p> + + <TT><pre> + For each row in b: + For each row in a such that a.x=b.y: + Return the row + </pre></TT> + + <p>The trick is that you want to arrange the order of tables so that the + "such that" clause on the inner loop is able to use an index to jump + right to the appropriate row instead of having to do a full table scan. + Suppose, for example, that you have an index on a(x) but not on b(y). + Then if you do this:</p> + + <TT><pre> + SELECT * FROM a, b WHERE a.x=b.y; + </pre></TT> + + <TT><pre> + For each row in a: + For each row in b such that b.y=a.x: + Return the row + </pre></TT> + + + <p>For each row in a, you have to do a full scan of table b. So the time + complexity will be O(N^2). But if you reverse the order of the tables + in the FROM clause, like this:</p> + + <TT><pre> + SELECT * FROM b, a WHERE b.y=a.x; + </pre></TT> + + <TT><pre> + For each row in b: + For each row in a such that a.x=b.y + Return the row + </pre></TT> + + <p>Now the inner loop is able to use an index to jump directly to the rows + in a that it needs and does not need to do a full scan of the table. + The time complexity drops to O(NlogN).</p> + + <p>So the rule should be: For every table other than the first, make sure + there is a term in the WHERE clause (or the ON or USING clause if that + is your preference) that lets the search jump directly to the relavant + rows in that table based on the results from tables to the left.</p> + + <p>Other database engines with more complex query optimizers will typically + attempt to reorder the tables in the FROM clause in order to give you + the best result. SQLite is more simple-minded - it codes whatever you + tell it to code.</p> + + <p>Before you ask, I'll point out that it makes no different whether you + say "a.x=b.y" or "b.y=a.x". They are equivalent. All of the following + generate the same code:</p> + + <TT><pre> + ON a.x=b.y + ON b.y=a.x + WHERE a.x=b.y + WHERE b.y=a.x + </pre></TT> + <H2>Indexes on INTEGER PRIMARY KEY columns (don't do it)</H2> <p>When you create a column with INTEGER PRIMARY KEY, SQLite uses this column as the key for (index to) the table structure. This is a hidden index (as it isn't displayed in SQLite_Master table) on this column. Adding another index on the column is not needed and will never be used. In addition it will slow INSERT, DELETE and UPDATE operations down.</p>