SQLite CVSTrac

*** 27,32 ****
--- 27,133 ----
  
  <p>Place the tables where you can eliminate the most rows by using a where clause (preferably on an indexed column) first, in order to limit the number of JOIN operations required.</p>
  
+ <p>The following is from a message posted by D. Richard Hipp to the SQLite mailing list regarding join translations:</p> 
+ 
+ <p>When SQLite sees this:</p>
+ 
+ <TT><pre>
+     SELECT * FROM a JOIN b ON a.x=b.y;
+ </pre></TT>
+ 
+ <p>It translate it into the following before compiling it:</p>
+ 
+ <TT><pre>
+     SELECT * FROM a, b WHERE a.x=b.y;
+ </pre></TT>
+ 
+ <p>Neither form is more efficient that the other.  Both will generate
+ identical code.  (There are subtle differences on an LEFT OUTER JOIN,
+ but those details can be ignored when you are looking at things at a
+ high level, as we are.)</p>
+ 
+ <p>SQLite implements joins using nested loops with the outer
+ loop formed by the first table in the join and the inner loop formed by
+ the last table in the join.  So for the example above you would have:</p>
+ 
+ <TT><pre>
+     For each row in a:
+       For each row in b such that b.y=a.x:
+         Return the row
+ </pre></TT>
+ 
+ <p>If you reverse the order of the tables in the FROM clause like
+ this:</p>
+ 
+ <TT><pre>
+     SELECT * FROM b, a WHERE a.x=b.y;
+ </pre></TT>
+ 
+ <p>You should get an equivalent result on output, but SQLite will implement
+ the query differently.  Specifically it does this:</p>
+ 
+ <TT><pre>
+     For each row in b:
+       For each row in a such that a.x=b.y:
+         Return the row
+ </pre></TT>
+ 
+ <p>The trick is that you want to arrange the order of tables so that the
+ "such that" clause on the inner loop is able to use an index to jump
+ right to the appropriate row instead of having to do a full table scan.
+ Suppose, for example, that you have an index on a(x) but not on b(y).
+ Then if you do this:</p>
+ 
+ <TT><pre>
+     SELECT * FROM a, b WHERE a.x=b.y;
+ </pre></TT>
+ 
+ <TT><pre>
+     For each row in a:
+       For each row in b such that b.y=a.x:
+         Return the row
+ </pre></TT>
+ 
+ 
+ <p>For each row in a, you have to do a full scan of table b.  So the time
+ complexity will be O(N^2).  But if you reverse the order of the tables
+ in the FROM clause, like this:</p>
+ 
+ <TT><pre>
+     SELECT * FROM b, a WHERE b.y=a.x;
+ </pre></TT>
+ 
+ <TT><pre>
+     For each row in b:
+       For each row in a such that a.x=b.y
+         Return the row
+ </pre></TT>
+ 
+ <p>Now the inner loop is able to use an index to jump directly to the rows
+ in a that it needs and does not need to do a full scan of the table.
+ The time complexity drops to O(NlogN).</p>
+ 
+ <p>So the rule should be:  For every table other than the first, make sure
+ there is a term in the WHERE clause (or the ON or USING clause if that
+ is your preference) that lets the search jump directly to the relavant
+ rows in that table based on the results from tables to the left.</p>
+ 
+ <p>Other database engines with more complex query optimizers will typically
+ attempt to reorder the tables in the FROM clause in order to give you
+ the best result.  SQLite is more simple-minded - it codes whatever you
+ tell it to code.</p>
+ 
+ <p>Before you ask, I'll point out that it makes no different whether you
+ say "a.x=b.y" or "b.y=a.x".  They are equivalent.  All of the following
+ generate the same code:</p>
+ 
+ <TT><pre>
+       ON a.x=b.y
+       ON b.y=a.x
+       WHERE a.x=b.y
+       WHERE b.y=a.x
+ </pre></TT>
+ 
  <H2>Indexes on INTEGER PRIMARY KEY columns (don't do it)</H2>
  
  <p>When you create a column with INTEGER PRIMARY KEY, SQLite uses this column as the key for (index to) the table structure.  This is a hidden index (as it isn't displayed in SQLite_Master table) on this column.  Adding another index on the column is not needed and will never be used.  In addition it will slow INSERT, DELETE and UPDATE operations down.</p>

sqlite - Performance Tuning