Small. Fast. Reliable.
Choose any three.
*** 2,7 ****
--- 2,15 ----
  
  ----
  
+ *Term directory for frequency and other info*
+ 
+ At some point, we're likely to add certain per-term information so that we can optimize queries.  For instance, we might like to order AND merges from smallest doclist to largest.  This may also allow for more efficient encoding.  The term dictionary could be encoded similar to the current btree, but the delta-encoding should work better due to having longer runs of terms together.  The leaf-node encoding would only need to store the termid of the first term, and the others can be assumed to increment, meaning leaves would store straight doclist data.
+ 
+ The gain from this is probably minimal, because it only helps if leaf nodes are very frequently broken, requiring full terms to be encoded.  That doesn't seem to happen frequently relative to the absolute number of terms.  This probably shows better gains as segments grow in size, due to standalone leaf nodes and greater density of terms.  Regardless, I suspect that encoding frequency information may be more-or-less free.
+ 
+ ----
+ 
  *Incremental merges*
  
  Except for level 0, when any level fills up we don't need to *immediately* merge it to the next level.  We just need to have finished that merge before the next incoming merge hits.  This means that when merging the 4kdoc segments into a 64kdoc segment, we can split it across 4k incremental steps.