*** 2,7 **** --- 2,19 ---- ---- + *Prefix-encode first terms* + + At this time, interior and leaf nodes fully-encode their first terms, and prefix-encode later terms. If we passed the key which decided which child to take down from the interior node, we could prefix-encode the first term against that. Thus, if we have runs of very long terms, we could save the initial hit from fully-encoding the term. + + The main flaw is handling the leftmost term. The first child of an interior node has no term in the interior node. For interior nodes which are themselves not leftmost in their parents, the term would be the one passed down from the parent. But for the leftmost arc in the tree, there is no such term. + + A solution would be to store the leftmost term of the tree at the root (in the segment directory). If this term is large, it could cause fragmentation of the segment directory, which would be negative for the entire tree. A solution to _that_ would be to cap the length. Either way, it's possible/likely that most of the term will never be used in the first place, because it won't share much prefix with the first term of the next leaf node (and even less as you rise through the tree). + + The best solution I've thought of would be to push the shared prefix from the first two terms of a node up. Long terms would thus be equitably distributed across the leftmost arc of the tree. This has the downside of making encoding much more complicated, since you can't encode the first term until you know the shared prefix from the parent node, etc. Probably not worthwhile at this time, but may be worthwhile if/when we start using a separate term directory (see a later idea on this list). + + ---- + *Integrate fts2 storage with sqlite btree* Convert %_segments to have fields for level, idx, and term0, and pull term0 out of the leaf nodes. In this form, interior nodes dissappear entirely, as does the segment directory. Segment merges would be pretty much unchanged. This could be a substantial improvement when implementing incremental merging, since it removes a layer of management from the system. Main concern is whether the additional btree will slow things down too much - the test immediately before fts2 used parallel btrees, and was still 10x slower than fts2 ended up.