CIS 307: TOPIC: Performance Evaluation

[Introduction], [Saturation and BottleNecks], [Little's Law], [Response Time in Round Robin Scheduling], [Operational Analysis]

Introduction

Engineers do it. Cooks do it. But seldom programmers do it. What is it? It is good Performance Evaluation of either an existing system or of its design. If an engineer estimates the need for steel in a building as a hundred tons when two hundred are called for, he will be laughed at (or worse). If a cook doubles the correct amount of spices in a dish, he is likely to lose clients. But if a programmer makes estimates that are orders of magnitude off, he feels that nothing extraordinary has happened and many other programmers are likely to feel the same.
Yet it is not too difficult to come up with performance estimates of times and sizes that are fairly accurate (from 50% off to only a few % off). Thoughout this course I do simple computations, often not more complicated than additions and multiplications, to estimate some performance factor of a system. In your first homework you have used a complex method of performance evaluation: discrete-event simulation. Other methods of performance evaluation are analytical, some involving sophisticated math like Markov models, other are simple, as we shall now see.

Saturation and BottleNecks

We say that a server is saturated, or that it has reached saturation when its utilization is 100% (i.e. it is busy all the time). For example, if a disk takes 10msec to service a request, it will saturate when it receives 100 requests per second. If a system has many servers, we can then talk of its bottleneck, that is, of the server that will saturate first as the load on the system increases. Once the bottleneck saturates, also the system will saturate, i.e. it will not be able to handle any greater load. The bottleneck of the system determines the throughput (number of jobs completed per unit of time) of the system. Bottleneck and system have the same bottleneck. [This discussion assumes that we have jobs that have all the same characteristics].
Example 1: In the following system

we can execute jobs. Each job first does READ, then moves to PROCESS, then moves to PRINT. These three phases are independent of each other in the sense that a job can be reading while another is processing and another is printing. Assume that a job takes 1 second to read, 4 seconds to process, and 2 seconds to print. Question: What is the (best possible) response time? Answer: 7 seconds. Question: What is the bottleneck in this system? Answer: The PROCESS. Question: How many jobs can be completed per second (at most)? Answer: 15.
Example 2: In the following system

jobs require a total of 100msec of compute time and require 20 operations on the disk whose service time is 10msec. Question: What is the response time? Answer: 300msec. Question: What is the bottleneck in this system? Answer: The disk. Question: What is the throughput of the system? Answer: 5 jobs per second.
Example 3: In the following system

jobs require a total of 100msec of compute time, require 20 IO operations on Disk 1 (whose service time is 10msec), and 50 IO operations on Disk 2 (whose service time is 5msec). Question: What is the response time? Answer: 550msec. Questions: What is the bottleneck? Answer: Disk 2. Question: What is the throughput? Answer: 4 jobs per second.

Little's Law

Little's Law applies to the single server queueing system. It is the equation:

N = a * T

where:

N is the number of jobs in the queueing system
a is the arrival rate (the number of jobs that arrive to the system per unit of time), and
T is the Response or TurnAround Time for a job (total time from arrival to departure from the system).

For example, if at the supermarket I notice that a teller has always 4 customer and that customers arrive every 2 minutes, then I know that customers, on average will be in line for 8 minutes.

Little's Law has an intuitive foundation and it can be proven under general assumptions as to the arrival rate of jobs, as to the service rate, and as to the scheduling discipline. One can use it reliably.

Response Time in Round Robin Scheduling

We have seen that RR scheduling involves a quantum q. A running process, after q seconds, is pre-empted and placed at the end of the ready queue.
Let's assume that a process, upon completion of a quantum, has probability r of having to run again (thus, probability 1-r of being terminated). r is often called a Routing Probability ("route" as in which direction to go).
How long, in total, will this program run? We can represent this system with a picture:

If we define g(i) as the probability that a job will go i times through the server, we have:

g(i) = (1-r)*(r**(i-1))

and the expected service time for the process, E(S), will be weighted average of the number of quanta used at the server, say i, times the probability of that many iterations, g(i). That is, E(S) is the sum for i from 1 to infinity of i*q*g(i) which is:

E(S) = q/(1-r)

A more difficult question, left unanswered, is how long will it take in this system for a job to terminate in the case that m jobs arrive simultaneously? [May be it is m*q/(1-r)?]

Operational Analysis

We can analyze the behavior of Queueing Networks (i.e. networks of queues and their servers) using fairly simple means. We will examine this topic by example.

Example 1

We have the following queueing network which represents a computer system with a CPU and a disk. When a job enters the system it goes to the CPU, whence it goes either to disk (an IO request) or terminates. Upon completion of the IO operation the job goes back to the CPU and things are repeated.

X0 represents the number of jobs entering the system per unit of time;
X1 represents the number of jobs exiting the system per unit of time;
X2 represents the number of jobs going to disk per unit of time;
X3 represents the number of jobs going to the CPU per unit of time.
At steady state (i.e. when the system displays the same behavior for long periods of time) we can assume that X0=X1. By observation we determine that in the system are present in total N jobs at a time. The service times at the CPU and at the disk are respectively Sc and Sd.

Here are the relations that will hold in this system:

X0+X2 = X3, because of continuity at the fork going to disk and to the exit
X2 = r*X3, because of definition of r, X2 and X3
N = X0*R, where R is the Response Time for this system, because of Little's Law.

We can use these relations any way we want to compute, given the values of some variables, the values of the remaining variables.

Example 2

Example 2 is very similar to example 1. But now the users of the systems are all sitting at terminals. Each user request constitutes a job.

We assume:

Each user request generates v = 20 disk requests
Disk utilization u is 50%
Service Time at the disk is Sd = 25ms
There are M terminals, M = 25
When a user receives a response, thinks for Z = 18 seconds before submitting the next request

Question: What is the response time R? We use the equations:

M = (R+Z)*X0, by Little's Law
X2 = v*X0, by definition of v, X0, X2
u = X2 / (1/Sd), by definition of u, X1, Sd

Then we get easily our response time:

X2 = u/Sd = 0.5/0.025 = 20
X0 = X2/v = 20/20 = 1
R = M/X0 - Z = 25/1 - 18 = 7

Notice that it is easy to determine the maximum possible value of X2, the saturation value, that is reached when the disk is fully utilized. Since the service time of the disk is 25ms, the maximum value of X2 is 40 [since in a second we can have at most 40 disk operations].

As you have seen from these examples, simple performance evaluation is very feasible, using intuition and simple math.

ingargiola.cis.temple.edu