Small. Fast. Reliable.
Choose any three.
*** 81,83 ****
--- 81,218 ----
  *: Punctuation matching
  *: Stemming
  *: Stop words
+ 
+ ----
+ *Proposed Interface*
+ (2006-Apr-28)
+ 
+ A full text index appears to the programmer to be a single
+ table.  This table is not directly realized on disk, however.
+ It is a virtual table - similar in concept to a VIEW except
+ that it is implemented using C code rather than as a
+ SELECT statement.  See VirtualTables for more information.
+ The full text index table is an interface
+ to C routines that do the appropriate searching, not a
+ materialized table.  Obviously, there must be some kind of
+ real backing store - the backing store just is not a direct
+ realization of the index table.
+ 
+ Assuming the full text index is named EX1 then the table
+ appears to have the following definition:
+ 
+    CREATE TABLE ex1(
+      docid INTEGER PRIMARY KEY,
+      doctext TEXT,
+      score REAL,
+      snippet TEXT,
+      pattern TEXT
+    );
+ 
+ There is a special API that creates this table.  It is not
+ created by executing a CREATE TABLE statement.  The CREATE
+ TABLE statement above is there to demonstrate the structure
+ of the table, not to show how the table is created.
+ 
+ The actual storage for the full text index might be in the
+ original database file or it might be in auxiliary files.
+ That decision is made in the code that implements the virtual
+ table and is transparent to the user.
+ 
+ Add documents to the index by issuing SQL statements like
+ this:
+ 
+    INSERT INTO ex1(doctxt) VALUES(:docbody);
+ 
+ Because EX1.DOCID is an integer primary key, a new value
+ for the document id is assigned automatically and is accessible
+ using sqlite3_last_insert_rowid() in the usual way.
+ 
+ The EX1.SCORE, EX1.SNIPPET, and EX1.PATTERN columns are
+ intended for reading, not writting.  Attempts to write
+ anything into those columns (other than NULL) will result
+ in a constraint error.
+ 
+ The full text index may or may not store the original document
+ text.  If the original document is not stored, then attempts
+ to read the EX1.DOCTEXT column will return a NULL.
+ 
+ The text segmenter, stemmer, and the stop-word list are
+ all built into the indexer.  To use a different segmenter,
+ stemmer, or stop-word list, register a different indexer
+ object.  If you have a situation where you what multiple
+ segmenters to be used in a single index, then you can
+ define an additional column in the virtual table that
+ specifies which segmenter to use.  For example:
+ 
+    CREATE TABLE ex2(
+      docid INTEGER PRIMARY KEY,
+      doctext TEXT,
+      doctype TEXT,  -- Specifies which segmenter to use
+      score REAL,
+      snippet TEXT,
+      pattern TEXT
+    );
+ 
+ Then
+ 
+    INSERT INTO ex2(doctype,doctext) VALUES('html', :docbody);
+ 
+ The knowledge of what segmenter names are valid will
+ still have to be built into the virtual table object.
+ If an unknown document type is specified, the virtual
+ table object will raise an exception on the insert.
+ 
+ This concept of using a table as an interface instead of
+ as bulk storage is unsettling to some people at first.  But
+ the more you think about it, the more the concept will grow
+ on you.
+ 
+ Documents can be removed from the index using
+ 
+    DELETE FROM ex1 WHERE docid=:id
+ 
+ To search for documents, do a select with a WHERE clause
+ constraint on the EX1.PATTERN column, like this:
+ 
+    SELECT docid FROM ex1
+     WHERE pattern='"George Washington" slept here';
+ 
+ The EX1.PATTERN column does not represent anything that
+ is actually stored.  This is just a way to specify the search
+ pattern.  If you query for the EX1.PATTERN column you will
+ always get back a NULL.  The EX1.PATTERN column is really
+ only useful in WHERE clause constraints.  In any other context
+ it always appears to be NULL.
+ 
+ The string that EX1.PATTERN is compared against will be
+ parsed by the indexer.  In this case, we have the two
+ keywords George and Washington in double-quotes, indicating
+ that this is a phrase search.  Only documents where these
+ words occur together will match.  Other kinds of search
+ syntaxes can be used.  It is up to the indexer implementation
+ to parser and interpret the search pattern.  To SQLite,
+ the pattern is just a string.
+ 
+ The EX1 table has some unusual properties (such as the
+ strange behavior of the PATTERN column) but in other respects
+ it can be used like any other table.  The EX1 table can appear
+ in a JOIN or a subquery just like any other table.  And you
+ can create VIEWs on the EX1 table.  But you cannot DROP the
+ EX1 table or create additional indices on it.
+ 
+ Additional information about the search result is available
+ from the SCORE and SNIPPET columns.  The SCORE is some kind
+ of score attached to the pattern match and SNIPPET is a
+ context snippet for the match.  These are not real columns.
+ Both are computed only if requested.  But they can be used
+ like any other column.  Consider this example:
+ 
+    SELECT docinfo.title, ex1.snippet, docinfo.link
+      FROM ex1 JOIN docinfo USING(docid)
+     WHERE ex1.pattern='"George Washington" slept here'
+       AND ex1.score>0.5
+     ORDER BY docinfo.date DESC, ex1.score DESC
+     LIMIT 20;
+ 
+ All of this is implemented using a the VirtualTables interface
+ to SQLite.