Database Durability

Durability guarantees that once a transaction is committed, its changes are permanently stored in the database, even in the event of a crash. But crashes DO happen - What should happen in the event of a crash when a transaction is being committed? How does a database ensure durability in such a case?

Each action can be written to disk - but this will break atomicity should the transaction only partially complete. Therefore, any transaction which begins must either complete or be rolled back. Databases can handle this process in different ways: two popular approaches are working with shadow copies/pages or a write-ahead log (WAL)

Shadow Copies/Pages

Shadow copies involved creating an entire copy of or part of the database. Should an error occur, the copy is removed. When the transaction completes, the old data can be replaced with the new data via an atomic operation (such as rename)

Write ahead logging

Write each action to a log. On crash, the unfinished transaction can be completed or rolled back. The WAL represents a sequence of actions which must be written to storage before any operation is executed on the database.

How to optimise such a log without suffering the penalty of constant writes? How to use this log to recover efficiently? These are the questions answered by ARIES - algorithms for recovery and isolation exploring semantics. the ARIES paper describes mechanisms for transaction recovery

Each log entry is structured to facilitate performance and minimise recovery impact. Each entry has the following:

  • Log sequence number: for ordering in a chronological fashion
  • A transactionID: associated with the transaction responsible for the action
  • A pageID: the location of the modified data. A database page is the minimum amount of IO possible, so this is where the data is located on disk.
  • previous log sequence number: ref to the previous log of the same transaction
  • undo: some function to undo the operation
  • redo: some function to apply the operation

in order to prevent these writes from becoming a bottleneck, they are buffered in memory. First, the query is parsed and executed, then the buffer pool which caches pages in memory is updated, then the log is updated. At some later point the log entries are flushed to disk. Deferring the writes in such a way allows us to take greater advantage of the hardware's pagesize rather than repeatedly wasting IOs. Later again, the buffer pool cache is written to disk. So why do we need a log? Writing to a sequential file is less effort than writing to a structure is optimised for querying and reading.