Here is how a client would operate
.......... Send request to M Wait for reply from M Use resource R Send release notice to M ..........This means that each use of the resource will have as overhead 3 messages. If W is the time we use the resource and T is the time taken by each message then the total time spent for a critical region is (W+3T) and the maximum rate of use of the resource for a single user is 1/(W+3T). Requests of multiple users can occur while the resource is being used, thus we can have the utilization W/(W+2T).
Now we look at what is done by the manager M:
It uses two variables: resourceIsFree, a boolean indicating state of resource, initially true; rqueue, a FIFO queue containing pending requests, initially empty; loop { wait for a message; if the message is a request { if resourceIsFree { allocate resource to requestor; set resourceIsFree to false; } else insert request in rqueue; } else /* it is a release */ if rqueue is empty set resourceIsFree to true; else { remove a request from rqueue; send reply to that requestor;} }Of course we are in trouble if the Manager dies. Firstly we do not know its current state (i.e., the state of the resource and who is waiting), second, we don't have a coordinator anymore. For the second problem we hold an election to select a new manager. For the first problem, each process has to recognise failure and adjust accordingly [each process knows its own state at the time of the failure and can communicate it to the new manager].
Notice the program executed by the manager M: after initialization it consists just of "handlers" for the various possible message types. This is typical for "responsive systems", i.e. systems whose purpose it to be available to answer messages from clients. Notice also that while the protocol for mutual exclusion is going on, the protocol for hearbeat is also executing, and so may also other protocols.
Each process must be written to include two threads of control. One to deal with the communication activities, the other to carry out the computations of this process.
self: the identity of this process (a number like 7); localClock: an integer representing a local logical clock (initially 0); used to record the time at which this node last made a request highestClock: an integer, the largest clock value seen in a request (initially 0); need: an integer representing number of replies needed before self can enter the critical region; pending: a boolean array with n entries; entry k (k!=self) is true iff process Pk has made a request and process self has not responded; the self entry is true if process self wants to enter critical region; mutex: mutual exclusion semaphore used to protect shared variables. s: a blocking semaphore used to hold the processing thread while the comunication thread takes care of synchronization.
loop { wait for a message; P(mutex); if the message is a request { let k be the identity of requestor and c the timestamp on message; set highestClock to max(localClock, c); if (pending[self] and ([self,localClock] < [k, c])){//[k,c] is less recent // [self,localClock] is less than [k,c] if localClock is // less than c or they are equal and self is less than k set pending[k] to true; } else { send response to k-th requestor;} } else { // the message is a reply decrement need by 1; if need is 0 // Replies come only if we sent a request V(s);} // Free the processing thread V(mutex): }
P(mutex); set pending[self] to true; set localClock to highestClock + 1; set need to n-1; V(mutex); send to all other processes a request stamped with [self, localClock]; P(s); // Here we wait for release from communication thread CRITICAL REGION P(mutex); set pending[self] to false; for k from 1 to n { if pending[k] { send response to k; set pending[k] to false;} V(mutex);This algorithm, due to Ricart-Agrawala has been extended by Raymond to the case where more that 1 process can be in the critical region at one time and to the case where requests are for multiple copies of a resource.
The following algorithm, a variation on the Garcia-Molina algorithm, carries out elections in a semi-synchronous distributed system. (A distributed system is semi-synchronous if there are known deadlines on the delivery of messages.) It is assumed that each node has an identifier, a number, and that that number represents the priority of the node. The node with highest priority among the available nodes will become the new coordinator after an election.
The algorithm has a number of parts:
A node can be in one of the following states:
A number of kinds of messages are sent by the elector (i.e. the node starting the election):
Each node keeps the following variables:
Elect: //Phase 1 For each node j > SELF Send STATUS_POLL message to j; Wait with timeout for reply; If there is a reply return; // SELF cannot be the new coordinator // Phase 2 Set ACTIVE to the empty set; For each node j < SELF Send ELECTION message to j Wait with timeout for reply; If there is a reply add j to ACTIVE; // Phase 3 Set ANSWERS to the empty set; For each node j < SELF Send COORDINATOR SELF message to j; Wait with timeout for reply; If there is a reply add j to ANSWERS; If ANSWERS != ACTIVE Goto Elect; // We restart things since the network has changed // Phase 4 Set ANSWERS to empty; For each node j in ACTIVE Send NEW-STATE message and recovery information to j; If there is a reply add j to ANSWERS; If ANSWERS != ACTIVE Goto Elect;
while (1) { Wait for a message m; switch (m.messageType) { case STATUS_POLL: Send SELF, STATUS; break; case ELECTION(sender): Set STATUS to ELECTION; Let CANDIDATE be a new variable and set it to sender; Stop processing on this node; Stop any other election that may be in progress at this time; Send SELF; break; case COORDINATOR(sender): If Status == ELECTION and sender == CANDIDATE Set STATUS to REORGANIZATION; Set COORDINATOR to sender; Send SELF; break; case NEW_STATE(sender,RecoveryInfo): If STATUS == REORGANIZATION and sender == COORDINATOR Set STATUS to NORMAL; Use the Recovery Information;} }
If SELF == COORDINATOR and STATUS == NORMAL For each node j Send STATUS-POLL message to j; Wait with timeout for reply(sender, status); If j is not in ACTIVE or returned status is not NORMAL Call the election procedure;
Set STATUS to DOWN; Call the election procedure;