Small. Fast. Reliable.
Choose any three.
This page is a sequel to SqliteTransaction. Please read SqliteTransaction now if you have not read it already.

Suppose a power loss or operating-system crash occurs in the middle of a transaction commit in SQLite. Only part of the changes associated with the commit will have been written to disk. This page overviews how the database is restored to its original state and made to appear as if nothing had ever been modified in the first place.


rollback-0.gif Initial State

The diagram on the right shows the initial state of the disk following a power failure that occurred in the middle of a commit. We had intended to modify three pages of the database file, but only one page was fully written and another page was partially written. So the database is in an inconsistent state. (The writes are shown in red.) The rollback journal still exists as a separate file on disk.



rollback-1.gif Step 1

The next time SQLite tries to open the database file, it first obtains a shared lock (as described in Step 1 of SqliteTransaction). After obtaining a shared lock, SQLite always checks to see if there is a "hot journal" file. A hot journal is a rollback journal that is not part of a ongoing write by another process and which has not been deleted.

In this example, there is a hot journal.



rollback-2.gif Step 2

Upon seeing a hot journal, SQLite realizes that it has to restore the database to its state prior to the start of the failed transaction before continuing. So it obtains an exclusive lock on the database file. The exclusive lock prevents two or more processes from trying to rollback the same hot journal at the same time.

If the process is unable to get an exclusive lock, it backs off. It releases its shared lock and returns SQLITE_BUSY. Presumably it will try again to get its shared lock after a delay of a few milliseconds, and by that point some other process will have already completed the recovery.



rollback-3.gif Step 3

After we have an exclusive lock on the database, the original content of all database pages that were to be changed by the transaction that failed are read from the rollback journal and written into the database file. This restores the database file to the same content it held prior to the start of the transaction.

We have not shown it here, but a sync operation must be performed in order to force the database out to the disk surface prior to the next step.



rollback-4.gif Step 4

Once the database file is restored to its original content prior to the start of the transaction, the rollback journal can be deleted. The failed transaction has been completely rolled back so the journal is no longer needed.



rollback-5.gif Step 5

Finally, the lock on the file can drop back from exclusive to shared and the database connection can continue as normal.

SQLite performs all of the above steps automatically and transparently on the first attempt to read a database that has suffered a failed transaction. The programming that is calling SQLite never knows that anything unusual has taken place. From the point of view of the SQLite client, it is as if the database had never been corrupted by a power failure or OS crash in the first place.



Attachments: