*** 24,29 **** --- 24,33 ---- I prefer not to expose SQLite to patent risk. The current implementation uses 17+ year old technology exclusively. + _::Kent used these algotitms in 1985 [J Kent, H Garcia Molina, and J Chung "An experimental evalution of crash recovery mechanisms" In ACM PODS pages 113-121, 1985; J M Kent "Performance and Implementation Issues in Database + Crash Recovery" PhD thesis, Princeton University, 1985] The ideas can be traced back through System R [J Gray, P McJones, M Blasgen, B Lindsay, R Lorie, T Price, F Putzolu, and I Traiger, "The recovery manager of the System R + database manager", Computing Surveys 13(2), 1981] and [R A Lorie "Physical integrity in a large segmented database" ACM Transactions on Database Systems 2(1), 1977] -- e (2004-04-14) + 2: The shadow pager assumes that a one-sector disk write is an atomic operation. Is that true of modern disks? I don't know but I'm guessing not. When a power failure occurs, the voltage to the disk controller *************** *** 37,42 **** --- 41,48 ---- least meta-data journalling, a partial sector write does not corrupt the database, as far as I am aware. + _:: Assuming that an SQLite page is bigger than a disk sector (typically 512 bytes), and that disk sectors do not span SQLite pages (true if both are power of 2 sized) then any partially changed sector will only occur to a free page, or to page 1. I mention in the attached pdf that it might be a good idea to "write ahead log" writes to page 1. Note that if either of the above assumptions is not true, the present SQLite will have the same database corruption issues on power failure. -- e (2004-04-14) + 3: Unless I missed something, the shadow pager allows a single write to occur while a read is ongoing, but not two consecutive writes. For example, the following is allowed: *************** *** 56,61 **** --- 62,69 ---- I cannot think of a method for locking the database in a way that prevents a second write from starting while there are still pending reads. + + _:: No. The design described will permit both of the above scenerios. The additions to the free list (or bit vector) of each write transaction is accumulated separately, and only used when the read transaction completes. This is handled by ORing together the free bit-vectors of transactions completed before any pending reads began. This operation is performed when each write transaction begins. -- e (2004-04-14) Ben Carlyle's proposal (or variations thereof) allows a write operation to start while reads are still ongoing. The write cannot commit until