Page History
- 2007-Jun-28 09:39 anonymous
- 2007-Jan-30 00:03 anonymous
- 2007-Jan-29 23:58 anonymous
- 2007-Jan-29 17:04 anonymous
- 2007-Jan-29 16:40 anonymous
- 2006-Oct-29 07:26 anonymous
- 2006-Jun-29 14:52 anonymous
- 2006-Jun-29 08:26 anonymous
- 2006-May-20 14:56 anonymous
- 2006-Mar-31 09:07 anonymous
- 2006-Mar-31 09:06 anonymous
- 2006-Mar-31 09:06 anonymous
- 2005-Sep-01 19:02 anonymous
- 2005-Apr-24 19:09 anonymous
- 2005-Mar-06 21:53 anonymous
- 2005-Mar-04 23:43 anonymous
- 2005-Mar-04 14:52 anonymous
- 2004-Aug-27 23:31 anonymous
- 2004-Aug-17 14:06 anonymous
- 2004-Aug-01 17:47 anonymous
- 2004-Apr-14 15:31 dougcurrie
- 2004-Apr-14 12:01 drh
- 2004-Apr-13 02:35 anonymous
- 2004-Apr-05 22:41 dougcurrie
- 2004-Mar-31 04:29 dougcurrie
- 2004-Mar-31 03:53 anonymous
- 2004-Mar-31 03:49 anonymous
- Read transactions are never blocked by any other transactions
- Write transactions are never blocked by Read transactions
So, the only blocking that takes place is that Write transactions block other Write transactions.
The cost of this new freedom for read transactions is data duplication. Write transactions always write to free disk pages, so the written data cannot interfere with read transactions. The space taken by the old version(s) of the data cannot be reclaimed until the read transaction(s) begun before the new versions were committed complete. So, the amount of data duplication is application dependent, but is expected to be small in practice.
Note that since the prior version of all written data remains in the database, there is no need for a journal file. So in the best case, with quick read transactions and therefore low data duplication, the shadow pager will use the same disk space as the present journaling SQLite implementation.
See the attached pdf for more details.
If Tatu Ylonen's web page is unavailable, a cached version of his referenced paper is available here.
By drh on 2004-Apr-14:
I have three issues with the shadow pager idea, in no particular order:
- The algorithm is less than 17 years old. That means that for all I know, it could be patented. Perhaps there is a submarine patent. Who knows. I prefer not to expose SQLite to patent risk. The current implementation uses 17+ year old technology exclusively.
- The shadow pager assumes that a one-sector disk write is an atomic operation. Is that true of modern disks? I don't know but I'm guessing not. When a power failure occurs, the voltage to the disk controller begins to decay. At some point, the voltage drops below the point where the disk controller can function. If that point happens to be when the first 1/3rd of the sector has been written, it seems to me that the next 2/3rds of the second would be unchanged, or perhaps overwritten with garbage. The current implementation might lose the entire database file if a power lose occurs during the write of an i-node on an unjournalled file system, but on a filesystem with at least meta-data journalling, a partial sector write does not corrupt the database, as far as I am aware.
- Unless I missed something, the shadow pager allows a single write to occur while a read is ongoing, but not two consecutive writes. For example, the following is allowed:
|-- write ---| |--------------------------- read ----------------------|
-
But this is now allows:
|-- write ---| |--- write ---| |--------------------------- read ----------------------|
-
The reason that the second version will not work is that the first
write may have added formerly used blocks to the freelist and the
second write might overwrite those freelist blocks while the read
is still using them, resulting in data corruption. Unfortunately,
I cannot think of a method for locking the database in a way that
prevents a second write from starting while there are still pending
reads.
Ben Carlyle's proposal (or variations thereof) allows a write operation to start while reads are still ongoing. The write cannot commit until all the reads finish. But this still gives you some of what the shadow pager offers. Specifically:
|-- write ---| |-------- read -------|
The current implemention requires that all reads end before the write begins. Like this:
|-- write ---| |--- read ---|
The current plan is to stay conservative and implement a modified version of the Carlyle proposal in version 3.0.
Attachments:
- shadow.pdf 55188 bytes added by anonymous on 2004-Mar-31 03:50:15 UTC.