Transactions II
[This topic is presented in pages 150-154 in Tannenbaum.]
We have talked of the desired properties of transactions using the
ACID acronym:
- Atomicity: All or none behavior.
- Consistency: Each transaction is correct, that is, it
preserves the invariants of the system (things like "the sum of all
credits and debits is zero").
- Isolation: Intermediate results during execution are invisible.
This is like saying that the
effect of the concurrent execution of transactions
is equivalent to their serial execution (thus, also called Serializability).
- Durability: the effect of a transaction that completes successfully
(commits) is persistent.
Durability is taken care by saving results in persistent storage.
Consistency is taken care at the application, not at the tool level
[my money transfer transaction should worry about preserving accounting
rules].
Isolation is taken care with two-phase locking [or with timestamps].
We are left with a fundamental problem: atomicity. We need to
examine how to deal with failure, aborts, commits.
Atomic transactions on a single system
We consider first the case where a single computer is
involved in a transaction. Usually the data is maintaned in a database
(we could also have flat files) under the control of a database manager (DM)
(in this context it is often called just resource manager).
Since reading from disk is slow and since synchronous writes to
disk are very slow, all sorts of buffering schemes and caches are used.
Thus write operations to persistent storage
will have to be done in the correct order and,
unless one is careful, data will be lost in case of system crashes.
A number of implementation strategies are possible.
- WriteAhead Log - Redo-Only, also called Deferred-Update
We assume we have in stable storage a writeahead log (also called a
Redo Log, or a Database Log, or an Intentions List), it is just a sequential
file.
Initially all the operations are done without modifying the data base,
just writing to the log for each update command the command and
the data that has to be written to complete the operation.
When all this information has been logged (this phase is called
the precommit phase) a commit record is written to the log.
Anything of the transaction that was done before this commit, in the case of
failure, is lost and we are as if back to the beginning of the
transaction. After the commit, the transaction will be completed
on the real data, not the log, because
we have in the log all the information needed to carry it out.
This method of implementing
the transaction is called redo-only because when we
commit we need to redo each (update) operation of
the transaction (if we give up we need not do anything
to the database).
The order in which log records are written is important since they must
be done over in the same order (at least, conflicting operations
must be done in the same order as indicated in the log) in the redo phase.
Here is what is done in correspondence to
the various operations of a transaction:
- StartTransaction: Write e new transaction record to the WriteAhead Log.
- Write (or Update): Write a Redo Record to the Redo Log. This record
contains the id of the transaction,
the identity of the variable being updated and the value being written.
- Read: Now we need to check if the variable to be read has already
been updated in the transaction,
thus we have to combine the information in the Redo Log with what is in the
database or, better, we can keep the updated object in the cache until the
transaction is committed and the new value written to persistent memory.
- Abort: discard the Redo Log of this transaction (and the updatedS objects
from the cache).
- EndTransaction (Commit): After all log writesare completed,
write to log the commit record and
apply to the database in order the actions in the WriteAhead
Log, from oldest to newest.
- WriteAhead Log - Undo-Only, also called Update-in-Place
An alternative is
to have a precommit phase where for each update command
we save to the WriteAhead Log (also called an Undo Log) the data that will
be modified (so that we can undo the command) and we
execute on the real database the command. We then
write a commit record to the log. If there is a crash
before the transaction is completed, we rollback the old values
by reading them from the
log from last to first. This method of implementing the transaction is called undo-only
since in case of failure before the commit we need to undo all the
updates done to the database.
Here is what is done in correspondence to
the various operations of a transaction:
- StartTransaction: Write a new transaction record to the Undo Log.
- Write (or Update): Write an Undo Record to the Undo Log. This record
contains the id of the transaction, the identity of the variable being updated and its old value. Then write the
new value to the data base. The Undo Record (at least, the old value)
must be in disk before the new
value is written to the DB.
- Read: Just read the value from the data base.
- Abort: Carry out the undo operations in reverse order to the one they were
done in.
- EndTransaction (Commit): Write to disk a commit record for the
transaction. Then discard the undo log for the transaction. The commit record cannot
be written until the old values of all objects modified in the transaction have
been written to disk.
- Private Workspace
This is an interesting idea already encountered in Copy-On-Write (COW)
fork operations and in Write-Once file systems. The idea is to
give to each user its own copy of the data being updated, but this
copy is just a directory of the blocks affected (the blocks may come from
many files). Then when the update is made by a transaction to a block, it is written to a new
block (called a shadow block) and the directory of that transaction is updated to point
to this new block. When all updates are carried out and the new directory
is copied to a log in stable storage
together with a commit record, the transaction is
ready to be completed by updating atomically the real directories of the
database. Notice that in case of failure before commit we do not
need to do anything to the database, just update lists of free blocks since blocks allocated
to the aborted transactions can be recovered.
All logs are kept in stable storage. For each transaction we will have
a start transaction record and an end transaction record. Then space for
the log can be recovered by compaction after freeing the space associated
to completed transactions, or by treating the log as a circular tape
where we restart from the beginning after having reached the end.
Restart is the operation required after a system failure.
The restart operation will take care of the transactions that were not completely
done at the time of the crash. Aborted transactions are undone, committed
transactions are completed, and uncommitted transactions are restarted.
Checkpoints (i.e. the saving of intermediate values of objects
during a transaction) can be used to reduce the amount of work to be done
during restart.
Atomic Transactions in a Distributed System
The problem is the following: A number of systems are involved
in executing a transaction. One is the transaction manager or coordinator,
say it is system 1,
the system where the transaction was started.
All other systems have a resource manager that operates as indicated above
for transactions on single systems. A resource manager
is responsible
for carrying out the local operations keeping a local log
and, in addition, for communicating with the (transaction) manager.
Each system has available stable storage to record
reliably their progress and decisions across temporary
local failures and recoveries.
We assume that communication is reliable and we accept the fact
that any system may fail and recover. We want to reach the situation where
all systems complete in agreement that the transaction is
committed (and they all are in the new state) or that it is
aborted (and they all are in the old state). Of course
the agreement should be faithful to the truth: there
is commitment iff all systems will complete their tasks.
A solution is provided by the Two-Phase Commit Protocol.
Part 1:
The transaction(a program in the transaction) starts by informing
the transaction manager (or coordinator) that will write to
stable storage a StartTransaction record and inform all participant
resource managers (also called participants).
Now the transaction is Active.
Carry out the operations of the transaction at each system
as we did in the pre-commit phase in the single system transaction.
Agreement Protocol:
At this point all the operations of the transaction have been
carried out in a "non-committed" fashion. A program in the transaction
issues the "commit" request (only one program can do that).
The manager writes a pre-commit record to the log (stable storage)
and sends to all participating resource managers the request that
they report their decision to commit or abort (REQUEST-TO-PREPARE):
Round 1:
All participants write a pre-commit or abort record to
the log (stable storage) and report to the coordinator
their local decision to commit or abort. A participant
will send a commit reply (PREPARED) only if it has saved to stable
storage all its state and the pre-commit record. Otherwise
it replies NO.
The coordinator collects the answers [if any is late, it
assumed to be an abort] then decides to commit iff
all replies were to commit. The decision is recorded
in stable storage at the coordinator with a commit record
Round 2:
The coordinator informs all participants of the decision with
a COMMIT or ABORT message. Now the participants must
carry out that decision. [The coordinator
will keep sending the decision until it
is recorded in stable storage at all participants.]
A participant will acknowledge with a DONE message a received commit
message so that the coordinator can stop overseeing this
participant and release resources it may be holding.
Part 2:
The actions are finalized at all systems. This can be accomplished
reliably since what to do is recorder in stable storage
at each site. The transaction becomes Inactive.
Note that people think of the two phases as referring to Part 1 and
Part 2, while in reality the two phases refer to the two rounds of
the agreement protocol.
The agreement protocol requires 3(n-1) messages [4(n-1) if we count
the REQUEST-TO-PREPARE message],
where n is the number of systems.
A transaction can be in one of four states:
- Executing: From the start of the transaction to the start of the two-phase
protocol.
- Uncertain: During the two-phase protocol.
- Committed or aborted: After the two-phase commit protocol before the
completion of the transaction.
The following two pictures display respectively the interactions between
the coordinator and a participant during a transaction,
and what they do in the case of failures.
The two-phase commit protocol is imperfect since
it assumes that no system will fail permanently.
- Consider the case that a participant
j fails permanently after round 1 when all had agreed to commit.
Then the coordinator decides to commit and informs the others in Part 2
but the transaction cannot complete since system j is not there to
accept the message from the coordinator and acknowledge it,
and the coordinator cannot abort because
by now some systems may have finalized their action.
The transaction remains in the uncertain state and becomes blocked.
Some extraordinary action must be taken, usually in the form
of a compensating transaction that undoes or somehow fixes
the previous transaction.
- It is also a problem if
the coordinator fails permanently at the end of round one, just before
recording its decision. The participants that have failed will know what to do
[abort, thus re-establishing the previous state],
but those that had decided to commit are in trouble [they are left hanging
on how to finalize the transaction, i.e. they are blocked] and the
transaction remains in the uncertain state. These blocked systems will attempt
to reach resolution by sending queries to the coordinator requesting
disposition information for the incomplete transaction.
So some additional work, not part of the two-phase commit
protocol, needs to be done to handle these problems.
Basically, we need to have some way of knowing when we can close the
books on a transaction and say that it is done.
A three-phase commit protocol has been suggested to deal
with the weaknesses of the two-phase protocol.
Impossibility of Guarantieing Consensus in Asynchronous Systems
The problems enclountered in the Two-Phase Commit Protocol (the blocking
of the transaction) is an example of a general problem: our inability
to guarantee arriving at a consensus in a system where
- messages are reliable,
- the execution and communication times of systems are not
with known upper bounds
- one system may fail.
Because of this impossibility, one aims not to guaranty consensus, but to
minimize the probability of failing of achieving consensus. Also, one
aims to establish
emergency procedures for handling failures.