We have from the beginning of the course worried about the concurrent execution of activities, their possible interleavings, about mutual exclusion, and critical sections. For example we saw when talking of undesired interleavings what are lost updates and inconsistent retrievals. Here we review them and mention two other kinds of undesirable interleavings caused by violations of the ACID properties and transaction aborts.
A1: Read value of Account 1 A2: Add 100 Dollars to it B1: Read value of Account 1 B2: Add 1000 Dollars to it B3: Store value in Account 1 A3: Store value in Account 1This interleaving results in a balance of 1100 dollars and not 2100 dollars as it should be.
A1: Read value of Account 1 A2: Subtract 100 dollars from it A3: Store value in Account 1 B1: Read value of Account 1 B2: Print this value B3: Read value of Account 2 B4: Print this value A4: Read value of Account 2 A5: Add 100 dollars to it A6: Store value in Account 2This interleaving will result in the apparent disappearence of 100 dollars.
A0: Start A1: Read value of Account 1 (say 100) A2: Add 10 to it A3: Store value in Account 1 B0: Start B1: Read value of Account 1 B2: Add 5 to it B3: Store value in Account 1 B4: End A4: AbortThe problem in this interleaving is that B uses a value updated by a transaction that has not yet been committed.
A0: Start A1: Read value of Account 1 (say 100) A2: Add 10 to it A3: Store value in Account 1 B0: Start B1: Read value of Account 1 B2: Add 5 to it B3: Store value in Account 1 B4: Abort A4: End
T1: R1(a) R1(b) W1(a) T2: R2(b) R2(a) W2(b)and the schedule H1:
H1: R1(a) R1(b) R2(b) W1(a) R2(a) W2(b)Since R2(b) and W1(a) operate on different objects, their order could be changed without affecting the result when executing H1. Thus we would get the schedule H2:
H2: R1(a) R1(b) W1(a) R2(b) R2(a) W2(b)which is just the serial execution of T1 followed by T2. In this case we say that the schedule H1 is serializable. Also the schedule H3 is serializable:
H3: R1(b) R1(a) R2(b) W1(a) R2(a) W2(b)Of course not all schedules are serializable. For example the schedule H4:
H4: R1(a) R1(b) R2(a) W1(a) R2(b) W2(b)is not serializable since R2(a)W1(a) cannot be reordered as W1(a)R2(a) since the values for a used in T2 in the two cases are different. [This is basically what was said with the Bernstein's conditions.]
We have a general way to determine if a schedule is serializable. Suppose that we have a schedule of transactions T1, T2, .., Tn. We create a serialization graph with nodes T1, T2, .., Tn. We then put an arc from node Ti to node Tj (and we say Ti precedes Tj) if we find in the schedule an Ri(x) before a Wj(x) or a Wi(x) before an Rj(x) or a Wj(x). Such operations are said to be conflicting. Notice that these operations will have to be protected by read or write locks and Ti will have to release the lock on x before Tj can acquire the lock on x.
Serializability Theorem: A schedule is serializable if and only if its serialization graph is acyclic.
But it is not reasonable to expect that transactions will be executed checking each time that no loop is inserted in the serialization graph.
Theorem: If transactions follow the two-phase locking policy then they will be serializable [that is, all their feasible schedules, i.e. schedules that respect the locks, are serializable].
In fact, by contradiction, if we had transactions that follow the two-phase locking policy and give origin to a schedule that is not serializable, the serialization graph for that schedule would be cyclic. Thus we would have a loop from transactions T1, to T2, .. to Tm, and back to T1. And each of these transactions would have to release a lock before the next can acquire it. Thus T1 would first release a lock and then acquire a lock, contradicting the fact that T1 is two-phase.
Two problems remain when using two-phase locking:
T1: L1(x) W1(x) L1(y) W1(y) U1(x) U1(y) T2: L2(y) W2(y) L2(x) W2(x) U2(y) U2(x)we can create the schedule
L1(x) W1(x) L2(y) W2(y) L1(y) L2(x) ...that is two-phase locking and results in deadlock when T1 executes L1(y) and T2 executes L2(x).
T1: S1 L1(x) W1(x) U1(x) A1 T2: S2 L2(x) R2(x) W2(x) U2(x) ..the two-phase locking policy will allow the schedule
S1 L1(x) S2 L2(x) W1(x) U1(x) R2(x) W2(x) U2(x) A1 ..which will require a cascaded abort A2 after T1 aborts.
Cascading aborts can be fully prevented if a transaction does not release any lock until the transaction is committed. An intermediate position is to allow the release of read-locks but keeping write-locks until the transaction is committed [this allows cascading aborts, but of a kind that can be easily handled as we will see when we discuss write-ahead logs].
There is an alternative to the use of locks, with consequent possibility
of deadlock: by using timestamps. Suppose that each object is timestamped
with the start time of the last transaction to access that object. Then a
transaction T will abort if it accesses an object that has a timestamp later
than the start time of T. This is bad if these aborts occur frequently,
but if rare, this technique is fine. This way of executing transactions
is called the Strict Execution of Transactions policy.
There is also a problem with how to
guarantee that the timestamp really reflect the true ordering of transaction
start times. This will be discussed when we study logical locks.
A0: Start A1: Something A2: Start A3: Something A4: Abort A5: Start A6: Something A7: End A8: Something A9: End is equivalent to just A0, A1, A5, A6, A7, A8, A9.