Small. Fast. Reliable.
Choose any three.
*** 12,21 ****
  The diagram at the right shows the initial
  state of the system.  All database information
  is held on disk.  Nothing is in the operating
! sytems's cache.  An since the database has not
  yet been opened, nothing has been loading into
  user memory either.
  
  {clear}
  
  ----
--- 12,34 ----
  The diagram at the right shows the initial
  state of the system.  All database information
  is held on disk.  Nothing is in the operating
! sytems's cache.  And since the database has not
  yet been opened, nothing has been loading into
  user memory either.
  
+ The database is divided into fixed-size pages.
+ The default page size is 1024 bytes, but this
+ can be changed either at compile-time or using
+ the {link: /pragma.html#pragma_page_size page_size PRAGMA}.
+ In the diagram on the right, each page of the database
+ is shown as a separate blue block.
+ 
+ Every page in a disk file can be potentially mirrored
+ in the operating system disk cache.  Initially, of course,
+ nothing has been read from disk so nothing has been mirrored
+ in cache.  We show this in the diagram by not coloring in
+ the pages of the database which are not yet in cache.
+ 
  {clear}
  
  ----
***************
*** 32,37 ****
--- 45,55 ----
  with some new data and the two would not necessary
  be consistent.  The shared lock prevents this.
  
+ Even though file locks are associated with disk files,
+ locks are not actually stored on disk.  They are held
+ in the operating system's file cache so that they
+ disappear automatically following a reboot.
+ 
  {clear}
  
  ----
***************
*** 42,47 ****
--- 60,73 ----
  is read into user memory.  The operating system also
  usually makes copies of these pages in its disk cache.
  
+ This example shows a database containing only 8 pages.
+ A real database would likely contain hundreds or
+ thousands or millions of pages.  One of the
+ advantages of using a b-tree based database engine,
+ like SQLite, is that only a small subset of the
+ database containing the information you actually need
+ is read into memory.
+ 
  {clear}
  
  ----
***************
*** 78,85 ****
  into a rollback journal file.  The name of the rollback
  journal is the same as the name of the original database
  with "-journal" appended.  The rollback journal contains
! a short header and some other formatting information but
! is basically just a list of the content of the pages that
  are to be changed.  The rollback journal contains everything
  that is needed to restore the database to its original
  state if the transaction is aborted.
--- 104,112 ----
  into a rollback journal file.  The name of the rollback
  journal is the same as the name of the original database
  with "-journal" appended.  The rollback journal contains
! a short header (shown in green)
! and some other formatting information but
! is otherwise just a list of the content of the pages that
  are to be changed.  The rollback journal contains everything
  that is needed to restore the database to its original
  state if the transaction is aborted.
***************
*** 92,99 ****
  
  Once the original page content has been written to the
  rollback journal, pages of the database in user memory
! can be modified.  Notice that no part of the database
! has yet been modified on disk or in the OS disk cache.
  Also notice that the content of the rollback journal
  might yet only be in the operating systems write cache
  and not yet reached the disk surface.
--- 119,132 ----
  
  Once the original page content has been written to the
  rollback journal, pages of the database in user memory
! can be modified. Modified pages are shown in red.
! 
! Notice that no part of the database
! has yet been changed on disk or in the OS disk cache.
! All of the changes have occurred in local memory only.
! So other processes can continue to read from the database
! file.
! 
  Also notice that the content of the rollback journal
  might yet only be in the operating systems write cache
  and not yet reached the disk surface.
***************
*** 110,115 ****
--- 143,150 ----
  be possible even if there is an unexpected powerloss
  during subsequent parts of the commit process.  This
  is the first "sync" operation that occurs during commit.
+ Actual synchronous disk I/O occurs at this point.  This
+ is one of the two steps that take a lot of time.
  
  {clear}
  
***************
*** 125,130 ****
--- 160,182 ----
  processes are reading since our writes might interfere
  with the reads of the other processes.
  
+ Escalating a reserved lock to an exclusive lock is a
+ really a two-step process.  The first thing that happens
+ si that the reserved lock is promoted to a "pending"
+ lock.  Both reserved and pending locks allow other
+ processes to continue reading.  But a pending lock
+ prevents new processes from starting to read.  Presumably,
+ after setting a pending lock all other processes will
+ eventually finish whatever reading they are doing and
+ drop their shared locks, thus allowing us to continue.
+ The pending lock prevents new processes from starting
+ a read while other processes are finishing.  In this
+ way, a pending lock helps prevent writer starvation.
+ 
+ Once all other processes have finished reading the
+ database (and dropped their shared locks) the
+ exclusive lock can be obtained.
+ 
  {clear}
  
  ----
***************
*** 146,152 ****
  The database changes must be flushed to disk.  Otherwise
  the information might be lost if the operating system
  crashes or if there is an unexpected power failure.
! this is the second "sync" of a commit.
  
  {clear}
  
--- 198,205 ----
  The database changes must be flushed to disk.  Otherwise
  the information might be lost if the operating system
  crashes or if there is an unexpected power failure.
! this is the second "sync" of a commit.  This step, together
! with step 6 above, is why commits are slow.
  
  {clear}
  
***************
*** 162,167 ****
--- 215,231 ----
  time the database is accessed.  After the rollback
  journal has been deleted, however, the transaction
  persists even if power is lost.
+ 
+ Deleting a file is not really an atomic operation,
+ but it appears to be atomic to a user-space program
+ like SQLite.  Following a power failure, if a program
+ (like SQLite) asks whether or not a file exists, it
+ receives back either a "yes" or a "no" as an answer.
+ When SQLite wants to know if a transaction has committed,
+ it asks the OS if the rollback journal exists, and gets
+ back an unambiguous "yes" or "no".  In this sense, the
+ commit is atomic, or at least appears to be from the
+ point of view of a user-level program like SQLite.
  
  {clear}