CIS 307: Spinlocks and Semaphores

[TestAndSet], [Spinlocks] [Definition and initial implementation of Semaphores], [Comments], [Posix Semaphores], [Java Semaphores], [Nutt's Implementation]

TestAndSet

We consider a machine instruction that supports concurrency. Always be conscious of the hardware support for Operating Systems functions. There are a number of reasons for this support. It may be the only way to solve problem and, almost always, it allows solutions that are more efficient than purely software solutions. [A classic example of harware support is for interrupts: during the execution of each instruction there is a test to see if a request for interrupt has arrived. If it has and is of sufficient priority, the instruction is suspend and the interrupt handled. Interrupts are made possible by (hidden) polling!]

We can think of the TestAndSet instruction as a function implemented atomically in hardware. Here is this function:

	int TestAndSet(int *x){
	   register int temp = *x;
	   *x = 1;
	   return temp;
	}

The hardware can achieve this effect as follows: the instruction involves two Processor/Memory bus interactions:

Read from x into a register temp
Write one to x

no access to that memory location x must be allowed (this can be achieved, for instance, by maintaining bus mastery) in between those two interactions. TestAndSet is an example of atomic instructions that support concurrency. Another frequently used instruction is the atomic Exchange operation:

         void Exchange(int *a, int *b) {
            int temp = *a;
            *a = *b;
            *b = temp;
         }

and another is CompareAndSwap:

         boolean CompareAndSwap(int *a, int old, int new) {
            if (*a != old) {
               *a = new;
               return true;
            } else 
               return false;
         }

SpinLocks

Spinlocks are abstract data types, or classes, that support a Busy-Wait solution for the Mutual Exclusion Problem. The operations on spinlocks are InitLock, Lock, and UnLock. If mutex is a spinlock that has been initialized using InitLock, then Critical Regions S1, S2, .., Sn can be protected from each other by preceding each region by a call to Lock and following it by a call to UnLock. The region Si becomes:

	Lock(mutex);
	Si;
	UnLock(mutex);

The pseudo code for implementing SpinLocks follows:


	typedef int SpinLock;

	void InitLock(SpinLock *L)
	{   *L = 0; }

	void Lock(SpinLock *L)
	{   while (TestAndSet(L)) 
		;
	}

	void UnLock(SpinLock *L)
	{   *L = 0; }

As you can see, the implementation is quite simple and with no system overhead since TestAndSet is a user mode instruction. It works because of the properties of the TestAndSet instruction. Notice that this solution involves Busy Waiting (polling).
Here is code for this kind of spinlock that works on the i386 architecture.
On the Intel Itanium hardware, with multiple execution elements, this implementation would not work because code such as

	lock(&s);
	n++;
	unlock(&s);
might be executed as
	lock(&s);
	*s = 0;
	n++;
because hardware reordering of instructions. It is  necessary in 
the unlock operation to insert a barrier instruction. [Same for lock.]

Here is a possible implementation of spinlocks using the Exchange instruction:

        typedef int SpinLock;
        void InitLock(SpinLock *s) {
           *s = 0;
        }
        void Lock (SpinLock *s) {
           int L = 1;
           do {
              Exchange(&L, s);
           } while (L == 1);
        }
        void UnLock (SpinLock *s) {
           *s = 0;
        }

and here using the CompareAndSwap instruction:

        typedef int SpinLock;
        void InitLock(SpinLock *s) {
           *s = 0;
        }
        void Lock (SpinLock *s) {
           do {
           } while (CompareAndSwap(s, 1, 1));
        }
        void UnLock (SpinLock *s) {
           *s = 0;
        }

The beauty of the CompareAndSwap instruction is that it can be used to implement data structures such as linked lists without the need of spinlocks or semaphores (non-blocking operations).

A big problem of the spinlock is that by repeatedly testandsetting a location in memory it can use a high percentage of the memory bus bandwidth, thus reducing the utilization of other procesors sharing the memory bus. This location that is repeatedly accessed is called a Hot Spot in the memory. A partial solution (in the case of cache-coherent shared memories) is to change the Lock operation as follows:

	void Lock(SpinLock *L)
	{   while (TestAndSet(L))
               while (*L == 1)
		  ;
	}

which uses less of the memory bandwidth since the TestAndSet instruction involves both a read and a write transaction on the memory while (*L==1) involves only a read operation, and this read operation is normally carried out on a cached copy. [The inner While Loop operates on the cached copy without affecting the shared memory. It assumes that the hardware supports some form (like snooping caches) of cache coherence algorithm which invalidates the cache copy when the shared memory copy is updated.]

There is no guaranty of Fairness for spinlocks. That is, if two processes P and Q are trying to enter a critical region protected by a locked spinlock mutex, then there is no guaranty that they will be serviced in FIFO order. Their order of entry will be random. More serious is the problem that the process P may use the spinlock within a loop and use the lock repeatedly while Q is unable to acquire it. [How can this happen?] This behavior could be avoided by giving a "Nice" behavior to the UnLock operation: when the Unlock is executed it is checked if another process is waiting and if yes the current process defers (gives up control of CPU) to it. Of course, in most situation this is an unwarranted precaution.

Spinlocks are dangerous to use in uniprocessor systems because of the Priority Inversion Problem. [A low priority process is in the critical region. A high priority process tries to enter the region. The low priority process cannot get the CPU which is held by the high priority process. The high priority process spins in the spinlock trying to enter the region being held by the low priority process. We have a deadlock. ]

Two possible solutions to the priority inversion problem:

Determine the maximum priority among the processes that can execute these critical regions, then make sure that all processes execute the critical regions at this priority. Since now competing processes have the same priority, they cannot preempt each other.
Execute the critical regions with masked interrupts. Thus the current process cannot be preempted (in the case we are considering there is only one processor).

A related problem is the formation of scheduling convoys. This occurs when a process ceases to run in the middle of a critical region [because of page fault, quantum expiration, external interrupt] and then all other processes sharing this region have to wait for its completion. A number of techniques are being introduced to deal with this problem, in particular Preemption-safe Locking and Non-blocking Algorithms. [Preemption-safe locking tries to avoid preemption during a critical region by having the kernel extend the quantum of a process in the critical region or by increasing its priority. Non-blocking algorithms, expecially for implementing shared counters and queues, make sure that if a process blocks within a critical region, it can be brought back to the beginning of the critical region and another process is allowed to enter the region.]

In general spinlocks with their busy waiting mechanism are desirable (as compared to non busy waiting mechanisms like semaphores) when the scheduling and context switching time are greater than the time required to execute the critical region, or when there are processors not required to do other tasks.

The efficient implementation of spinlocks is of great concern because spinlocks are the usual foundation for synchronization, expecially in multiprocessor systems. They are used everywhere and very frequently. As the use of multiprocessor systems increases so will the interest in efficient spinlocks and in alternative synchronization mechanisms

Definition and initial implementation of Semaphores

People use the term "Semaphore" to refer to a variety of synchronization mechanisms. Here by "Semaphore" we mean the "non-busy-waiting" analog of SpinLocks. We show a possible implementation for semaphores. For now assume that semaphore's operations are atomic.


	type semaphore = record
			    count : integer;
  1.			    q     : queue of processes;
			 end record;
	procedure initsem (s : in out semaphore; n : integer) is
	begin
	   s.count := n; initialize s.q to the empty queue;
	end;
	procedure P(s : in out semaphore) is
	begin
  2.	   s.count := s.count - 1;
  3.	   if s.count < 0 then 
  4.	      Remove this process from Running, add it 
	              to s.q, and reschedule;
	end;
	procedure V(s : in out semaphore) is
	begin
  5.	   s.count := s.count + 1;
  6.	   if s.count <= 0 and s.q is not empty then
  7.	      Remove a process from s.q, add it to the Ready 
		      Queue, and reschedule;
	end;

Comments

On line 1. we talk of a "queue of processes". A process is usually represented by a process id, a pointer or an index that refers to a Process Control Block (PCB). A PCB contains the relevant information about a process state.
Lines 2., 3., and 4. form a critical region. The same is true for lines 5., 6., and 7.. As such they need to be enforced either by masking interrupts (in single processor systems) or by using spinlocks.
The P operation is also called Down or Wait. The V operation is also called Up or Signal.
Reschedule operates as follows: say A is the highest priority process in the Ready queue (you may assume that there is always one such process, at least a Null process).
- If A has higher priority than the currently running process B, move B to the Ready queue and A to Running. [This is a pre-emption - in some cases, for efficiency, the switch is delayed until the time-slice expires or the current process blocks.]
- If A does not have higher priority than the currently running process B, do nothing.
- If there is no running process, move A to Running.
In all cases, resume the (possibly new) running process.
If initsem is called with the parameter n set to 0, we say that we have a Blocking Semaphore; if with 1, we say that we have a Boolean or Mutual Exclusion Semaphore; and if n is greater than 1, we say that we have a Counting Semaphore.
Boolean semaphores are the non-busy-waiting analogs of spinlocks.

Using semaphores we can achieve a number of different behaviors:

If S is a boolean semaphore, then we can implement related critical regions by prefixing them with the statement P(S); and affixing them with V(S);
If S is a counting semaphore initialized to n (say, 10), then up to n (10) different processes could be executing a region enclosed by P(S)..V(S) statements.
If S is a blocking semaphore, then we can make sure that a process completes execution of statement A before another process starts execution a statement B, by following A with the statement V(S) and preceding B with the statement P(S). This is for example one way of implementing the arc semantics in Precedence Graphs.

Semaphores do not suffer the same form of the Priority Inversion Problem as Spinlocks. You may see the kind of Priority Inversion Problem that can arise for Semaphores in the following note about the Mars Rover. We have a high priority process communicating with a low priority process using a small critical region. While the low priority process is in the critical region the high priority process waits in the semaphore protecting the region. If an intermediate, independent process starts running at this point, it preempts the low priority process and prevents the higher priority job to get on its important way.

Posix Semaphores

Posix has defined semaphores that have the behavior we have described above. However at present they work only within a process to synchronize threads, not across processes. We will not use them in this course preferring the use of mutex locks and condition variables (see the threads discussion in the notes).
The operations defined in Posix for semaphores are:

       #include < semaphore.h>

       int sem_init(sem_t *sem, int pshared, unsigned int value);
       int sem_wait(sem_t * sem);
       int sem_trywait(sem_t * sem);
       int sem_post(sem_t * sem);
       int sem_getvalue(sem_t * sem, int * sval);
       int sem_destroy(sem_t * sem);

where sem_init as second parameter should have 0 (meaning semaphore used for thread) and as third parameter should have the intended initial count for the semaphore (1 => mutual exclusion, 0 => blocking, > 1 => counting). sem_wait and sem_post are the V and P operations. sem_trywait always succeeds: if count was positive, it acts as sem_wait and returns 0, and if count is <=0 it does nothing and returns -1. sem_getvalue jest returns the current count. sem_destroy is the destructor for this object.

Java Semaphores

Though Java currently does predefines semaphores, preferring to allow users to use the more general synchronized method approach, the new version of Java (1.5) will predefine a Semaphore class with characteristics similar to what described in this note, but more general.

Nutt's Implementation

Here we show code similar to the code used by G.J.Nutt in his Operating Systems book. This code implements N semaphores for a multiprocessing system.

#define	 N   ..         /* The number of semaphores */

int      value[N];	/* Keeps the count for the semaphores   */
boolean  sem[N];	/* Spinlocks to protect critical region */
			/*   of semaphore */
boolean  hold[N];	/* Used to temporarily "enqueue" tasks  */
			/* waiting on sem */

void initsem(int s, m) {
	value[s] = m; sem[s] = FALSE; hold[s] = TRUE;}
void P(int s) {
	while (TS(sem[s]));	/* Lock */
	value[s]--;
	if (value[s] < 0) {
	   sem[s] = FALSE;	/* Unlock */
	   while(TS(hold[s])) yield(Q); /* Here the process enters */
			/* a new Critical Region where it gives    */
			/* control to Q, a scheduler. It is a loop */
			/* since we may repeatedly be given control*/
			/* back from yield and find a blocked semaphore.*/
	}else
	   sem[s] = FALSE;}	/* Unlock */
void V(int s){
	while (TS(sem[s]));	/* Lock */
	value[s]++;
	if (value[s] <= 0){	/* Somebody is waiting in hold to enter CR*/
	   while (!hold[s]);	/* We check if there is a concurrent V. */
				/* It is a loop */
				/* because we might otherwise have two or more */
				/* consecutive V operations all setting (see next */
				/* statement) hold[s] to false. */
	   hold[s] = FALSE;	/* Here we free a process waiting in P,*/
				/* namely on hold[s].  */
	}
	sem[s] = FALSE;}	/* Unlock */

ingargio@joda.cis.temple.edu