Implementing Recovery

COMP9315 21T1 ♢ Implementing Recovery ♢ [0/20]
❖ Recovery


For a DBMS to recover from a system failure, it needs

Assume multiple transactions are running when failure occurs A critical mechanism in achieving this is the transaction (event) log
COMP9315 21T1 ♢ Implementing Recovery ♢ [1/20]
❖ Logging

Three "styles" of logging

All approaches require: Known as write-ahead logging    (PostgreSQL uses WAL)
COMP9315 21T1 ♢ Implementing Recovery ♢ [2/20]
❖ Undo Logging

Simple form of logging which ensures atomicity.

Log file consists of a sequence of small records:

Notes:
COMP9315 21T1 ♢ Implementing Recovery ♢ [3/20]
❖ Undo Logging (cont)

Data must be written to disk in the following order:

  1. <START> transaction log record
  2. <UPDATE> log records indicating changes
  3. the changed data elements themselves
  4. <COMMIT> log record
Note: sufficient to have <T,X,v> output before X, for each X
COMP9315 21T1 ♢ Implementing Recovery ♢ [4/20]
❖ Undo Logging (cont)

For the example transaction, we would get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6
(11) EndCommit                                <COMMIT T>
(12) FlushLog

Note that T is not regarded as committed until (12) completes.

COMP9315 21T1 ♢ Implementing Recovery ♢ [5/20]
❖ Undo Logging (cont)

Simplified view of recovery using UNDO logging:

Assumes we scan entire log; use checkpoints to limit scan.
COMP9315 21T1 ♢ Implementing Recovery ♢ [6/20]
❖ Undo Logging (cont)

Algorithmic view of recovery using UNDO logging:

committedTrans = abortedTrans = startedTrans = {}
for each log record from most recent to oldest {
    switch (log record) {
    <COMMIT T> : add T to committedTrans
    <ABORT T>  : add T to abortedTrans
    <START T>  : add T to startedTrans
    <T,X,v>    : if (T in committedTrans)
                     // don't undo committed changes
                 else  // roll-back changes
                     { WRITE(X,v); OUTPUT(X) }
}   }
for each T in startedTrans {
    if (T in committedTrans) ignore
    else if (T in abortedTrans) ignore
    else write <ABORT T> to log
}
flush log

COMP9315 21T1 ♢ Implementing Recovery ♢ [7/20]
❖ Checkpointing

Simple view of recovery implies reading entire log file.

Since log file grows without bound, this is infeasible.

Eventually we can delete "old" section of log.

This point is called a checkpoint.
COMP9315 21T1 ♢ Implementing Recovery ♢ [8/20]
❖ Checkpointing (cont)

Problem: many concurrent/overlapping transactions.

How to know that all have finished?

  1. periodically, write log record <CHKPT (T1,..,Tk)>
    (contains references to all active transactions active tx table)
  2. continue normal processing (e.g. new tx's can start)
  3. when all of T1,..,Tk have completed,
    write log record <ENDCHKPT> and flush log
Note: tx manager maintains chkpt and active tx information
COMP9315 21T1 ♢ Implementing Recovery ♢ [9/20]
❖ Checkpointing (cont)

Recovery: scan backwards through log file processing as before.

Determining where to stop depends on ...

If we encounter <ENDCHKPT> first:

If we encounter <CHKPT (T1,..,Tk)> first:
COMP9315 21T1 ♢ Implementing Recovery ♢ [10/20]
❖ Redo Logging

Problem with UNDO logging:

Alternative approach is redo logging:
COMP9315 21T1 ♢ Implementing Recovery ♢ [11/20]
❖ Redo Logging (cont)

Requirement for redo logging: write-ahead rule.

Data must be written to disk as follows:

  1. start transaction log record
  2. update log records indicating changes
  3. then commit log record (OUTPUT)
  4. then OUTPUT changed data elements themselves
Note that update log records now contain <T,X,v'>,
where v' is the new value for X.
COMP9315 21T1 ♢ Implementing Recovery ♢ [12/20]
❖ Redo Logging (cont)

For the example transaction, we would get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,6>
(7)  COMMIT                                   <COMMIT T>
(8)  FlushLog
(9)  OUTPUT(A)     6   16     6    16     5
(10) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (8) completes.

COMP9315 21T1 ♢ Implementing Recovery ♢ [13/20]
❖ Redo Logging (cont)

Simplified view of recovery using REDO logging:

Assumes we scan entire log; use checkpoints to limit scan.
COMP9315 21T1 ♢ Implementing Recovery ♢ [14/20]
❖ Undo/Redo Logging

UNDO logging and REDO logging are incompatible in

Undo/Redo logging combines aspects of both As for previous cases, requires write-ahead of log records.

Undo/redo loging is common in practice; Aries algorithm.

COMP9315 21T1 ♢ Implementing Recovery ♢ [15/20]
❖ Undo/Redo Logging (cont)

For the example transaction, we might get:


t    Action        v  B(A)  B(B)  D(A)  D(B)  Log
--------------------------------------------------------
(0)  BEGIN         .    .     .     8     5   <START T>
(1)  READ(A,v)     8    8     .     8     5
(2)  v = v*2      16    8     .     8     5
(3)  WRITE(A,v)   16   16     .     8     5   <T,A,8,16>
(4)  READ(B,v)     5   16     5     8     5
(5)  v = v+1       6   16     5     8     5
(6)  WRITE(B,v)    6   16     6     8     5   <T,B,5,6>
(7)  FlushLog
(8)  StartCommit
(9)  OUTPUT(A)     6   16     6    16     5
(10)                                          <COMMIT T>
(11) OUTPUT(B)     6   16     6    16     6

Note that T is regarded as committed as soon as (10) completes.

COMP9315 21T1 ♢ Implementing Recovery ♢ [16/20]
❖ Undo/Redo Logging (cont)

Simplified view of recovery using UNDO/REDO logging:

COMP9315 21T1 ♢ Implementing Recovery ♢ [17/20]
❖ Undo/Redo Logging (cont)

The above description simplifies details of undo/redo logging.

Aries is a complete algorithm for undo/redo logging.

Differences to what we have described:

COMP9315 21T1 ♢ Implementing Recovery ♢ [18/20]
❖ Recovery in PostgreSQL

PostgreSQL uses write-ahead undo/redo style logging.

It also uses multi-version concurrency control, which

MVCC simplifies some aspects of undo/redo, e.g.
COMP9315 21T1 ♢ Implementing Recovery ♢ [19/20]
❖ Recovery in PostgreSQL (cont)

Transaction/logging code is distributed throughout backend.

Core transaction code is in src/backend/access/transam.

Transaction/logging data is written to files in PGDATA/pg_wal

COMP9315 21T1 ♢ Implementing Recovery ♢ [20/20]


Produced: 20 Apr 2021