Small. Fast. Reliable.
Choose any three.
*** 83,90 ****
  *: Stop words
  
  ----
! *Proposed Interface*
! (2006-Apr-28)
  
  A full text index appears to the programmer to be a single
  table.  This table is not directly realized on disk, however.
--- 83,89 ----
  *: Stop words
  
  ----
! **Proposed Programming Interface For Full-Text Search In SQLite**
  
  A full text index appears to the programmer to be a single
  table.  This table is not directly realized on disk, however.
***************
*** 97,235 ****
  real backing store - the backing store just is not a direct
  realization of the index table.
  
! Assuming the full text index is named EX1 then the table
! appears to have the following definition:
  
!    CREATE TABLE ex1(
!      docid INTEGER PRIMARY KEY,
!      doctext TEXT,
!      score REAL,
!      snippet TEXT,
!      pattern TEXT
     );
  
! There is a special API that creates this table.  It is not
! created by executing a CREATE TABLE statement.  The CREATE
! TABLE statement above is there to demonstrate the structure
! of the table, not to show how the table is created.
! 
! The actual storage for the full text index might be in the
! original database file or it might be in auxiliary files.
! That decision is made in the code that implements the virtual
! table and is transparent to the user.
! 
! Add documents to the index by issuing SQL statements like
! this:
! 
!    INSERT INTO ex1(doctxt) VALUES(:docbody);
! 
! Because EX1.DOCID is an integer primary key, a new value
! for the document id is assigned automatically and is accessible
! using sqlite3_last_insert_rowid() in the usual way.
! 
! The EX1.SCORE, EX1.SNIPPET, and EX1.PATTERN columns are
! intended for reading, not writting.  Attempts to write
! anything into those columns (other than NULL) will result
! in a constraint error.
! 
! The full text index may or may not store the original document
! text.  If the original document is not stored, then attempts
! to read the EX1.DOCTEXT column will return a NULL.
! 
! The text segmenter, stemmer, and the stop-word list are
! all built into the indexer.  To use a different segmenter,
! stemmer, or stop-word list, register a different indexer
! object.  If you have a situation where you what multiple
! segmenters to be used in a single index, then you can
! define an additional column in the virtual table that
! specifies which segmenter to use.  For example:
! 
!    CREATE TABLE ex2(
!      docid INTEGER PRIMARY KEY,
!      doctext TEXT,
!      doctype TEXT,  -- Specifies which segmenter to use
!      score REAL,
!      snippet TEXT,
!      pattern TEXT
!    );
! 
! Then
! 
!    INSERT INTO ex2(doctype,doctext) VALUES('html', :docbody);
  
! The knowledge of what segmenter names are valid will
! still have to be built into the virtual table object.
! If an unknown document type is specified, the virtual
! table object will raise an exception on the insert.
! 
! This concept of using a table as an interface instead of
! as bulk storage is unsettling to some people at first.  But
! the more you think about it, the more the concept will grow
! on you.
! 
! Documents can be removed from the index using
! 
!    DELETE FROM ex1 WHERE docid=:id
! 
! To search for documents, do a select with a WHERE clause
! constraint on the EX1.PATTERN column, like this:
! 
!    SELECT docid FROM ex1
!     WHERE pattern='"George Washington" slept here';
! 
! The EX1.PATTERN column does not represent anything that
! is actually stored.  This is just a way to specify the search
! pattern.  If you query for the EX1.PATTERN column you will
! always get back a NULL.  The EX1.PATTERN column is really
! only useful in WHERE clause constraints.  In any other context
! it always appears to be NULL.
! 
! The string that EX1.PATTERN is compared against will be
! parsed by the indexer.  In this case, we have the two
! keywords George and Washington in double-quotes, indicating
! that this is a phrase search.  Only documents where these
! words occur together will match.  Other kinds of search
! syntaxes can be used.  It is up to the indexer implementation
! to parser and interpret the search pattern.  To SQLite,
! the pattern is just a string.
! 
! The EX1 table has some unusual properties (such as the
! strange behavior of the PATTERN column) but in other respects
! it can be used like any other table.  The EX1 table can appear
! in a JOIN or a subquery just like any other table.  And you
! can create VIEWs on the EX1 table.  But you cannot DROP the
! EX1 table or create additional indices on it.
! 
! Additional information about the search result is available
! from the SCORE and SNIPPET columns.  The SCORE is some kind
! of score attached to the pattern match and SNIPPET is a
! context snippet for the match.  These are not real columns.
! Both are computed only if requested.  But they can be used
! like any other column.  Consider this example:
! 
!    SELECT docinfo.title, ex1.snippet, docinfo.link
       FROM ex1 JOIN docinfo USING(docid)
!     WHERE ex1.pattern='"George Washington" slept here'
!       AND ex1.score>0.5
!     ORDER BY docinfo.date DESC, ex1.score DESC
!     LIMIT 20;
! 
! All of this is implemented using a the VirtualTables interface
! to SQLite.
! 
! *Advantages and Disadvantages Of The Interface Proposal Above*
! 
! Advantages:
! 
! *: The VirtualTables mechanism has been suggested before and has
! other uses outside of full-text search.
! 
! *: Because the full-text search is a generic add-on, we can
! experiment with different full-text search algorithms.  Users
! can add their own algorithms simply by adding a new or modified
! virtual table.  There is no need to hack on the SQLite core in order
! to substitute an alternative full-text search algorithm.
! 
! Disadvantages:
  
! *: No other database engine that we know of does it this way.
--- 96,380 ----
  real backing store - the backing store just is not a direct
  realization of the index table.
  
! A full text index is created by calling the following
! C API on an open database connection:
  
!    int sqlite3fts1_create(
!      sqlite3 *db,
!      const char *zTableName,
!      sqlite3fts1_stemmer *xStemmerFunc,
!      const char **azSubfields,
!      int flags
     );
  
! The db parameter is the pointer to an open SQLite database
! in which the full text index already exists or is to be
! created.  zTableName is the name of the virtual table that
! will represent the full text index.  xStemmerFunc is a
! user-supplied function for segmenting and stemming the
! input text.  The details of the interface to this stemmer
! are to be determined.  If the xStemmerFunc parameter is NULL
! then a default stemmer is used.
! The azSubfields parameter, if not NULL, is a list of auxiliary
! fields associated with each document.  The list is terminated
! by a NULL pointer.  More on this subject in the sequel.
! The flags parameter is an ORed set of flags that control
! optional features of the full text index.</p>
! 
! The sqlite3fts1_create() function creates a new virtual table
! named according to zTableName.  It also creates, if they do not
! exist, several real tables and indices that serve as the backing
! store for the index.  The real tables and indices are given names
! that are prefixed with zTableName.  The real tables are intended
! for use by the virtual table only.  Programmers should not attempt
! to make direct use of the real tables.  The content of these
! backing-store tables is subject to change without notice and
! is not especially useful to the programmer.  Direct modification
! of the backing-store tables will likely cause the full text
! index to malfunction.
! 
! When sqlite3fts1_create() is called, a new full text index
! is created if it does not already exist.  If the full text
! index does already exist then only the virtual table interface
! is created.  When calling sqlite3fts1_create() for an existing
! full text index, one must use the exact same set of parameters
! that originally created the full text index.
! 
! The virtual table that represents the full text index contains
! the columns listed below at a minimum.  Additional columns might
! also be available depending on options.
! 
! <html>
! <table border="0" cellspacing="15">
! <tr>
! <td valign="top">docid</td>
! <td>
! A document id is assigned to each document as it is added
! to the index.  Document ids are monotonically increasing
! and are limited to 32 bits.
! The docid column serves as the INTEGER PRIMARY KEY for the
! virtual table.  After inserting a document, the id assigned
! to that document is available through the
! sqlite3_last_insert_rowid() API.
! The docid column is read-only.  The docid is created
! automatically by the virtual table and cannot be modified
! by the application.
! </td>
! </tr>
! <tr>
! <td valign="top">doctext</td>
! <td>
! The complete text of a document is inserted into this
! column in order to add a document to the full text index.
! This is an insert-only column.  Attempts to read this column
! will return NULL.  Attempts to update this column will raise
! an exception.
! </td>
! </tr>
! <tr>
! <td valign="top">pattern</td>
! <td>
! This column is used as a constraint in the WHERE clause of
! a SELECT statement in order to invoke a full text search.
! Attempts to read this column in any context other than
! a WHERE clause will return NULL.  Attempts to write this
! column will raise an exception.
! </td>
! </tr>
! <tr>
! <td valign="top">score</td>
! <td>
! When performing a full text search, this column returns
! a integer which reflects the quality of the current match.
! Higher numbers are better.
! </td>
! </tr>
! </table>
! 
! <p>Suppose a full text index has been created and named "ex1".
! To insert a new document into the index, invoke the following SQL:
! </p>
! 
! <blockquote><pre>
!   INSERT INTO ex1(doctext) VALUES(:txt);
!   SELECT last_insert_rowid();
! </pre></blockquote>
! 
! <p>Bind the ":txt" parameter to the complete text of the document
! to be inserted, of course.  The INSERT statement will cause the
! full text indexer to read and segment the document and make
! appropriate entries in its backing-store tables to enable it
! to find this document again based on a keyword search of its
! content.  A new document ID will be computed automatically and
! made available via the sqlite3_last_insert_rowid() interface.</p>
! 
! <p>
! To remove a document from the full text index, use a DELETE statement
! as follows:
! </p>
! 
! <blockquote><pre>
!    DELETE FROM ex1 WHERE docid=:docid
! </pre></blockquote>
! 
! <p>Bind the document ID to the :docid parameter before executing the
! above statement, of course.</p>
! 
! <p>To find all documents that match a particular search pattern,
! use a SELECT statement as follows:</p>
! 
! <blockquote><pre>
!    SELECT docid, score FROM ex1 WHERE pattern=:pattern
! </pre></blockquote>
! 
! <p>In this case, bind the :pattern parameter to the pattern of keywords
! you are searching for.  The SELECT statement will return the document ID
! and a score for each matching document.  The score will be an integer
! which is larger for better matches.</p>
! 
! <p>Queries against the virtual ex1 table can appear in joins and
! in subqueries.  For example, to sort the results by date, one might
! do a join as follows:</p>
  
! <blockquote><pre>
!    SELECT docinfo.docname
       FROM ex1 JOIN docinfo USING(docid)
!     WHERE pattern=:pattern
!     ORDER BY docinfo.date DESC
!     LIMIT 20
! </pre></blockquote>
! 
! <h3>Search Patterns</h3>
! 
! <p>A search pattern is a white-space separated list of keywords.
! Each keyword is passed through the stemmer before being submitted
! to the search engine.  Hence, if the stemmer does case folding then
! the search will be case insensitive and if the stemmer does not do
! case folding then the search will be case sensitive.  (If you use
! a different stemmer to search than was used to build the index then
! the results will be unpredictable and probably incorrect.)</p>
! 
! <p>Keywords separated by whitespace are assumed to be connected
! by a logical AND operator.  You can also explicitly specify an
! AND if desired.  The AND operator must be in all-caps.   Keywords
! may also be separated by "OR" which means that documents will
! match that contain either the word to the left or the right of
! the "OR" (or both). The OR operator binds more tightly than
! AND.  Use the "NOT" or a minus sign prefix to
! specify words which disqualify a document from consideration.
! Parentheses may be used to group operators.</p>
! 
! <p>A sequence of keywords in double-quotes is a phrase.
! A phrase will match only if all keywords appear in the document
! in consecutive order.</p>
! 
! <p>Glob characters "*", "?", and "[...]" can be used within
! keywords.  The result is as if the search pattern had been
! expanded to include all words that match the glob pattern
! separated by OR.  Be warned that wildcards at or near the beginning
! of a keyword or which match many words in the document lexicon
! can lead to long search times.  Also note that wildcards might
! confuse the stemmer.</p>
! 
! <p>Any non-alphanumeric characters other than those mentioned
! above are passed to the stemmer.  If the stemmer chooses to
! ignore them then they do not enter into the search.  But if
! the stemmer does handle or accept the characters, then they
! become part of the search keyword.</p>
! 
! <p>Mismatched parentheses and other syntax errors in a search
! pattern are ignore.
! Search patterns never raise an exception no matter how badly formed.
! The full text search engine muddles through the best it can.</p>
! 
! <h3>Options</h3>
! 
! <p>The flags parameter to the sqlite3fts1_create() function
! is an ORed combination of the following flags:</p>
! 
! <table border="0" cellspacing="15">
! <tr>
! <td valign="top">SQLITE3FTS1_OFFSETS</td>
! <td>
! If present, the virtual table will contain a read-only column named
! "offsets" that will supply the byte-offsets in the original document
! text to the beginning of each matching word in a match result.
! </td>
! </tr>
! <tr>
! <td valign="top">SQLITE3FTS1_SNIPPET</td>
! <td>
! If present, the virtual table will contain a read-only column
! named "snippet" which will return a string showing matching
! words in context for each match.
! </td>
! </tr>
! </table>
! 
! <h3>Subfields</h3>
! 
! <p>The azSubfields parameter to sqlite3fts1_create() is a NULL-terminated
! list of pointers to strings where each string is the name of a
! subfield within the document text.  If the azSubfields parameter is
! itself a NULL pointer or if it points to an empty list then subfields
! will not be supported by the full text index.</p>
! 
! <p>If subfields are supported, then the virtual table contains
! additional columns for each subfield.  If the name of a subfield
! is "xyz" then the new columns are named "xyz" and "xyz_boost".</p>
! 
! <p>The "xyz" column is write-only and is used when a new document
! is added to the full text index.  The "xyz" column is a text
! field contain a list of pairs of integer offsets into the body of the
! document.  Each pair of offsets specifies the beginning and end
! of a substring in the document which are the contents of the
! specified field.  The "xyz_boost" column is used in the WHERE clause
! of queries to specify multiplier for the score of keywords that
! are contained within the subfield.</p>
! 
! <p>When performing a search on an index with subfields, one can
! restrict keywords to appear only in that subfield by prepending the
! name of the subfield and a colon to the keyword.</p>
! 
! <p>For example, suppose the full text index is being used for an
! email application.  The subfields might be "subject", "to", and "from".
! As each new email messages is being added to the index, strings containing
! integer pairs are inserted into the "subject", "to", and "from" columns
! that delimit the parts of the email message that contain the text of
! the subject, the recipients names and address, and the senders
! name and address.</p>
! 
! <blockquote><pre>
!     INSERT INTO emailidx(doctext,subject,to,from)
!      VALUES(:doctext, :subject, :to, :from)
! </pre></blockquote>
! 
! <p>In this example, the :doctext parameter is bound to the complete
! text of the email message as before.  The :subject parameter might
! be bound to a string like "134 168" to indicate that the text of
! the subject line occurs between characters 134 and 168 of :doctext.
! Similar bindings on :to and :from specify the locations in :doctext
! where the recipients and sender can be found.  If a subfield occurs
! in more than one place in the body of the document, then additional
! offsets can be added to the list.</p>
! 
! <p>When querying, additional weight can be given to words that
! match the subject line, for example, by specifying a subject_boost
! in the WHERE clause:</p>
! 
! <blockquote><pre>
!    SELECT docid, score FROM emailidx
!     WHERE pattern=:pattern AND subject_boost=2.5
! </pre></blockquote>
! 
! <p>The search string can contains keywords prefixed by "subject:"
! or "to:" or "from:" meaning that those keywords must occur within
! the specified subfields in order to match.  For example, to find
! discussion of full text indexing on the SQLite mailing list, one
! might search for:</p>
! 
! <blockquote>
!   subject:sqlite "full text index"
! </blockquote>
  
! </html>