CIS 307: Introduction to Distributed Systems I

[SISD,SIMD,MISD,MIMD], [Computational Speedup], [Some Characteristics of Communication Primitives], [Network OS vs Distributed OS], [OSI Architecture], [TCP/IP Architecture], [Internet Addresses], [Information in the IP Header], [Information in TCP Header], [Sliding Window Algorithm], [Ethernet], [More Readings on TCP/IP Architecture], [Client-Server Architecture], [Intranets]

These notes are additions|comments|summaries of material covered in Chapters 9 and 10 (you are not required to know anything on Group Communication) of Tanenbaum's textbook.

SISD, SIMD, MISD, MIMD

A classification often used for computer systems is due to Flynn:

SISD: Single Instruction, Single Data: it is the traditional computer, where a single instruction is executed at a time on scalar individual values.
SIMD: Single Instruction, Multiple Data: a Vector Machine. It can, for example add in parallel the corresponding elements of two vectors.
MISD: Multiple Instructions, Single Data: Not a popular choice!
MIMD: Multiple Instructions, Multiple Data, i.e. the concurrently executing instructions are each with their own arguments. MIMD are distinguished by some into Multiprocessor Systems in the case that they have processors that share memory, and Multi-Computers, i.e. systems that consist of computers that communicate through switches and not shared memory.
Another terminology used to characterize how multiple processors in a system may share memory is:
- Shared-Nothing, where each processor has its own private memory and communicates with other processors only thorugh messages.
- Shared-Global, where each processor has some private memory plus some globally shared memory. And:
- Shared, where all memory is shared by all processes.
Clearly there is a limit to how much memory sharing can take place. The maximum speed of signals in practice is 200,000 kilometers per second, i.e. 20 centimeters (less than a foot) per nanosecond. Thus computers that share memory must be fairly close to each other.
An intermediate position, becoming popular, is to have Clusters of processors sharing memory communicating by messages with other clusters.
Finally you may hear of UMA systems where you have Uniform Memory Access and of NUMA systems, where you have Non-Uniform Memory Access.

We do not consider these systems to be Distributed Systems. A distributed system [for us] consists of computer systems that are connected by networks. Those other systems are parallel computers. [Of course, there are other kinds of "things" that exhibit parallelism: think of superscalar computers (for instance Pentium). Here you have a single "big" instruction containing "little" regular instructions that are executed simultaneously.]

Computational Speedup

Suppose we run a program using a single processor and it takes time T(1). We then run the program using n processors (we assume the program is written to take advantage of the available number of processors) and it takes time T(n). What is the relationship between T(1), T(n) and n? For example, if the program takes 100sec on one processor, how long should it take on four processors? May be 25sec? in fact T(n) at best should be T(1)/n. [Cases where T(n) is less than T(1)/n, i.e. in our example less than 25 seconds, are called cases of superlinearity. Such cases can be explained and analyzed separately in terms of OS or computational characteristics. For example, a large program split into 4 parts running independently may have in the parts a smaller, more easily accomodated working set.]

We call Computational Speedup s the ratio:

s = T(1)/T(n)

and Efficiency e the ratio:

e = s/n

Amdhal has suggested that any computation can be analyzed in terms of a portion that must be executed sequentially, Ts, and a portion that can be executed in parallel, Tp. Then

T(n) = Ts + Tp/n

This is known as Amdhal's Law. It says that we can improve only the parallel part of a computation. Going back to our example, if the computation requires 40 seconds of sequential computation and 60 of parallel computation, then the best that T(n) will ever be is 40sec, even for n growing to infinity. In practice there are many problems where the sequential components are minimal and thus these problems can be effectively parallelized. In general though speedup due to parallelism tends to be not too great. Some people suggest that the speedup grows with the square root of n, i.e, if we quadruple the number of processors, we only double the performance. [Notice that in this case efficiency will change as 1 over the square root of n, i.e. the more the processors, the less efficiently they will be used.] The maximum speedup assuming Amdhal's Law is for n going to infinity. That is

s = (Ts + Tp)/Ts = 1 + Tp/Ts

Some Characteristics of Communication Primitives

Here are some characteristics of communication primitives. Naming, i.e. how is a name associated to a communicating entity? Some possibilities are : the name is hardwired, the name is chosen at random, there is a name server (i.e. somebody responsible for giving names). Blocking, i.e. does the sender wait until message is sent, or just enqueues it and goes on? This is the distinction between synchronous and asynchronous calls [semi-synchronous calls are asynchronous calls where a signal is received at the time the message is actually sent]. Buffering, i.e. what to do when a message arrives and there is no waiting receiver. Reliability, i.e. whether messages are or not acknowledged.

Network OS vs Distributed OS

A Network Operating System is a network of computers with possibly different operating systems that are able to easily communicate with each other. Agent processes take care of the intercomputer communication. In a NOS 1) each computer has a local OS, 2) users work on specific computers 3) no transparency 4) no extra reliability. Example: SunOS.

A Distributed Operating System is a network of computers that appears to the user as a single computer and manages all the resources of the physical computers. The user is not aware of where computations or data are located. The system is intended to have high reliability. Example: Amoeba.

OSI Architecture

The textbook covers this topic. Here are a few observations. Standards and Protocols are used to define representation and interaction modes and make certain functionalities generally available. Standards and protocols are continuously being introduced. They fit within general frameworks called Architectures Thus the OSI architecture is not a specific set of protocols and standards, it is the definition of the functionality of protocols and standards that should be used. Of course, in practice people tend to associate architectures with their most popular standards and protocols.

Stallings in his operating systems book [MacMillan 1992] has the following simple characterization for the OSI Layers:

Physical:
Concerned with transmission of unstructured bit stream over physical link; involves such parameters as signal voltage level and bit duration; deals with mechanical, electrical, and procedural characteristics to establish, maintain, and deactivate physical link.
Data Link:
Provides for the reliable transfer of data across the physical link; sends blocks of data (frames) with the necessary synchronization, error control, and flow control.
Network:
Provides upper layers with independence from the data transmission and switching technologies used to connect systems; responsible for establishing, maintaining, and terminating connections.
Transport:
Provides reliable, transparent transfer of data between end points; provides end-to-end error recovery and flow control.
Session:
Provides the control structuure for communication between applications; establishes, manages, and terminates connections (sessions) between cooperating applications.
Presentation:
Performs transformations on data to provide a standardized application interface and to provide common communication services; examples: encryption, text compression, reformatting.
Application:
Provides services to users of the OSI environment; examples: transaction server, file transfer service, network management.

There is a fundamental difference between the lowest three levels (the communication subnet) and the top four levels of the OS architecture. The bottom layers are between directly connected hosts or involve all the hosts in a path from sender to receiver. The top four layers are end-to-end protocols, that is, the communication is stated in terms of only the original sender and the final destination, independent of how many intermediate hosts are traversed. Intermediate nodes do not participate at all in the processing of the higher level protocols, to them it is data. [Think in terms of overhead: In the source and target node protocols at all layers are processed. In the intermediate nodes only protocols in the bottom three layers are processed.] This has a direct impact on efficiency: for example, error checking in protocols at this level is only done at the sender and receiver, not at the intermediate hosts.

Each message (or frame, or packet) consists of data being transmitted plus information required by the protocol for addressing, error detection, etc. This extra information appears as a header before that data and (may be) a trailer after the data, i.e. the data is encapsulated in the message. [Not all messages have both a header and a trailer. Usually the trailer is not present.] The message sent at layer i will be transmitted as data by the layer below it. Assuming that the layer below can transmit this data as a single message we will have the situation

Note that the headers and tails consitute transmission overhead, reducing the utilization of the bandwidth of the communication channel. Of course this is only part of the communication overhead: retransmissions and acknowledgements further reduce bandwidth.

Communication can be connection-oriented or connectionless. In the former case a connection is established between the communicating agents and on that connection messages are exchanged. In the latter instead each message is treated as a free standing entity (datagram). Related concepts at the communication level are circuit switched and packet switched. In the former communication between interlocutors always follows the same path [this may be "forever" or it may be only for a single connection(virtual circuit)]. In the latter individual packets follow their own independent routes. Another related concept is a session. It consists of one or more connections. For example, a program on machine A may be involved in communication with a program on machine B, the connection drops due to communication problems. When communication is reestablished the programs continue from where they were in the session using a new connection. An example of a session is the interactions during a remote procedure call. The requestor has to send the request, possibly in multiple messages, the receiver collects them, processes the request, then sends back the results. All in one session.

The transmission units at different layers may be of different maximum sizes. We have here the same distinction that exists between "logical" and "physical" records in file systems. One message at a layer may have to be fragmented over multiple messages at a lower layer. [This is called Fragmentation]. A layer may be able to accept messages of length N, but the implementation of the layer in terms of a protocol at the layer below may accept only messages of length M, with M < N. Thus the original layer must be able to fragment the original message into a number of messages at the lower level. These fragment may then be sent across different routes and the receiver will have to worry about missing fragments, incorrect received order, in recombining them into the correct received message. This problem occurs in particular between the network layer (IP protocol) and the data link layer, where the IP packet needs to be split into a number of fragments.
The opposite situation may also occur: At the data link layer with ethernet we have messages of up to 1526 bytes. At the network layer we may have a limit of 128 bytes. [Or the data link unit may be 53 bytes...]

Routing is the process by which a message moves from a sender to a receiver across multiple intermediate hosts. It is a very complex activity worth of its own separate course. Part of the complexity is that we want to have a system that keeps on working while hosts and links fail and come on line at their own pace. Routers are placed between neighboring networks to decide which of the incoming packets should be kept on the local network and which should me moved to the other network. It is not an optimal process: packets keep a hop count that is increment with each transfer. When it gets too big, as in a loop, the packet is discarded.

A number of issues should be kept in mind when analysing a protocol and its implementation. Among them:

Framing: How do we recognise the beginning and end of a message (or packet, or frame, or ..);
Flow Control: How do we make sure we don't send too much information to the receiver so that it cannot handle it and has to "drop it on the floor";
Multiplexing: How a single communication channel can be shared for more than one conversation;
Addressing: How do interlocutors address each other [as we have already seen, host interfaces are known by their IP address (a 32 bit integer); unfortunately in ethernet the address understood by the hardware is the Ethernet address (a 48 bit integer) of the connector; thus there is need of a way to translate from IP to Ethernet addresses. In general, this requires an address resolution protocol.
Error Detection and Correction: what we do to detect if an error has occurred in transmission and how to correct it.

Examples of protocols used at the various levels:

Application layer: telnet, ftp, finger, ..

Presentation layer: XDR

Session layer: RPC

Transport layer: TCP (connection oriented), UDP (connectionless)

Network layer: X.25 (connection oriented), IP (connectionless), ICMP (Internet Control Message Protocol - connectionless)

Data link layer: HDLC, ethernet, ATM

Physical layer: RS-232-C, ethernet

TCP/IP Architecture

The TCP/IP architecture was developed mainly in the US. It is the one that has the largest number of users. It represents pragmatic solutions to problems as they arose. It involves only four layers: The application layer (same as in OSI), the end-to-end layer (TCP or UDP or ...), the internet layer (IP, or ICMP, or ..), and the net access layer (ethernet, or ATM, or PPP, or SLIP, or ....).

Internet Addresses

Internet addresses are used in the network layer (and above) protocols. For example they are used in the IP protocol. An internet integer is a 32-bit integer with the format described in the following figure.

As you can see internet addresses usually have two parts, one identifying a network, and the other identifying a computer on that network. The class D is scarcely used. It is supposed to be for multicasting, i.e. for sending messages to multiple users. The classes A, B, and C have repectively a few networks, each with many hosts, or a reasonable number of both, or a lot of networks each with a few hosts.
To facilitate the use of internet addresses and the routing of messages in a large local organization, it is possible to subdivide the host address, say in the B class, into two parts, one for a subnetwork, the other for the host. [You may want to use subnetworks to reduce traffic on the subnetworks and to facilitate their management.]
An internet address identifies a connection of a computer to a network. Thus if a computer is connected to multiple networks it will have multiple internet addresses.
Since an internet address is only 32 bits, people are afraid that we may be running out of such addresses and are working to create a wider class of internet addresses.
Since it is hard to remember a 32 bit integer, people are accustomed to represent it as a sequence of four numbers, one for each octet (byte). For example snowhite is 155.247.190.1. And since also this is hard to remember, people use names such as snowhite.cis.temple.edu and require a name service to map one kind of identification into the other.

Internet addresses are not the only form of addresses used on the internet. For example ethernet (physical and data link layer) uses 48 bit addresses that are unique worldwide. Hence a conversion problem arises, how to go from one kind of address to the other and viceversa. Two protocols, ARP (Address Resolution Protocol) and RARP (Reverse ARP), are used to this end.

Information on the IP Header

The Internet Protocol (IP) is a connectionless protocol [messages are sent individually, not after a connection has been established] with "best-effort" delivery [i.e. delivery will be tried but not guarantied]. Below is a depiction of the IP's header.

Version Number:: The version number of IP (currently, I think, is 4)
Length:: Since the header is of variable length (a minimum of five words, the rest are optional), it gives the length of the header as a number of 32 bit words.
Type of Service:: How urgent is this message: Precedence, normal delay, low delay, .. The default is 0.
Packet Length:: The maximum length of an IP packet is 65,535 bytes, but usually less.
Identification: The identity of this packet. It is unique for a sending host.
Flags:: Two possibilities: DF (don't fragment) and MF (more fragments). It is used to identify fragments in decomposed messages. If the DF is set then we cannot fragment a long message coming from the network layer [we have to discard it]. For example IP accepts messages of up to 64KB. Ethernet instead supports messages of only up to 1526 bytes of data. Thus an IP message will be fragmented before transmission using ether net in the data link and physical layers. But this will take place only if DF is not on.
Fragment offset:: Once an IP block splits, each fragment receives a copy of the IP header and the fragment offset has to specify the relative position of a fragment in the original message. Fragments are of lengths that are multiples of 8 bytes [i.e. 1 means 8 bytes].
Time to Live:: Usually it is measured in "hops", i.e. the number of systems to traverse at most before reaching the destination. A message is discarded if it does more hops than the stated time-to-live of page.
Transport Protocol:: It identifies the protocol used at the next layer up [transport]. It usually is TCP or UDP.
Header Checksum:: The 1-complement of the 16 bit sum of the 16 bit words in the header.
Source and Destination Addresses:: Their internet addresses.

After the IP header there may be optional fields. Particularly interesting is the fact that a packet may contain directions on which path to follow, request the nodes it goes though to inform the sender that they have received the packet, and even to return a timestamp of when packet was received. Thus one can follow a packet as it moves, and see at what speed it goes.

Note that sufficient information is provided in IP packets to deal with the problems of fragmentation [header length, packet lenght, offset of fragment, if more fragments follow].

ICMP, which is also a network layer protocol, uses IP packets to transmit control information. For example, it may inform that a host is unreacheable, or that a node is unable to do buffering, or to request the local time of a node, or to report such time, or to inform that a datagram was thrown outbecause of too many hops.

Information on the TCP Header

The Transmission Control Protocol (TCP) is a connection oriented [thus sender and receiver are as connected in a virtual circuit], bidirectional, stream oriented [as opposite to message oriented, i.e. users think in terms of sending and receiving a stream of octets; the implementation will actually use messages, called segments.] reliable transmission protocol at the transport layer. The same format for the header is used in both directions in a connection. Here is a picture of the header of a segment and a description of some of its fields.

Source and Destination Ports:

They are 16 bit numbers. A port number is local to a specific host, i.e. different hosts have 65,535 different ports. In unix /etc/services keeps a list of specific uses for some of these ports. For example, HTTP uses 80, Telnet uses 23, FTP uses 21.

Sequence and Acknowledgement Numbers:

These numbers are local to a connection between two nodes, i.e. they are unique during the life of message. The initial sequence number is agreed between the sender and the receiver when the connection is set up. For example, if two nodes A, B are communicating, then A as sender may chose initially the number 200 and B as sender may chose initially the number 500. Then B will acknowledge messages relative to 200 and A will acknowledge messages relative to 500. More on this later.

Data Offset:

Length of the TCP header, counted in 32-bit words.

Flags:

There are a variety. Among them:

ACK: Confirms that the acknowledgement field is valid
SYN: Request to setup a connection. It must be acknowledged.
FIN: Request to shutdown. It must be acknowledged.

Window Size:

It specifies the size of the receiver's available buffer (called window).

Sliding Window Algorithm

A way of sending reliably data is the following: each time a sender sends a message, it waits for confirmation, and only then sends the next message. Unfortunately this can give rise to long transmission times. An alternative is to send multiple messages as long as one does not get too far ahead of the acknowledgements. This common sense observation is the basis for the Sliding Window Algorithm. It goes as follows. Each partner sets up and shares a window size. Say A sets window to 100 and B to 200. Then A(B) feels free to send up to 100 (200) octets without receiving acknowledgement. Then if A has sent 80 octets and received acknowledgement for 20, it knows it can send safely 40 more octets. And B reasons similarly.
Note that one packet can carry some data and simultaneously acknowledge a previously received message.

Ethernet

Ethernet (standard IEEE 802.5) is the most commonly used protocol at the physical, data link layer. It uses 48-bit addresses. The Etherent packet has up to 1500 octets of data. These are preceded by 20 for the IP header, 20 for the TCP/UDP header, 6 for the sender address and 6 for receiver address, and an initial 8 used for synchronization. The packed is followed by 4 OCTETS, the CRC sum. Thus there is a 64 octet overhead for each message sent with ethernet.

Many hosts can be on a LAN with ethernet. They all share the same communication channel, thus multiple nodes may attemt to transmit at the same time, with possible collisions. A protocol, CSMA/CD (Carrier Sense, Multiple Access with Collision Detection), is used to resolve the resulting problems.

Client-Server Architecture

Though the OSI (and TCP/IP) architecture arrives up to the "application layer", much work these days is happening above that level, in what people call middleware with things like the Distributed Computing Environment (DCE) and the Common Object Request Broker Architecture(CORBA). These systems help create an infrastructure to support the construction and use of distributed applications.

A typical way in which people organize distributed applications is represented by the Client-Server Model. In this model a program at some node acts as a Server which provides some Service, things like a Name Service, a File Service, a Data Base Service, etc. The Client sends requests to the Server and receives replies from it. The Server can be Stateful or Stateless, i.e. may remember information across different requests from the same client. The Server can be iterative or concurrent. In the former case it processes one request at a time, in the latter it will fork a separate process (or create a separate thread) to handle each requests. Also the Client could be iterative or concurrent, depending if it makes requests to different servers sequentially or concurrently. The traditional way of interaction from the client to the server is with Remote Procedure Calls (RPC).
Note that the way we interact with an object, by calling one of its methods, is a typical client-server interaction, where the object is the server. Thus distributed objects fit within the client-server architecture. Objects, at different times, may play the role of server to the other.
In the Client-Server Architecture the initiative lays always with the client. The server only responds to requests [the requests are pulled from the server]. A Peer-to-Peer Architecture instead emphasizes that the interaction initiative may lay with any object.

Intranets

These days much of the development in distributed applications takes the form of intranets. An intranet is a distributed system providing access to the information systems of an enterprise through the use of internet, and more specifically web, technology using usually web browsers. Central issues are:

What services are provided [e-mail, group cooperation tools, access to legacy information systems, access to data bases, workflow support, ..].
How to access in a safe manner information through passwords, encryption, etc. Different people in an organisation need to receive different rights and the size of the entities protected may be fine. Rights must be safely and conveniently granted and revoked.
How to access from a web server traditional SQL databases.
How to limit the cost of development and maintenance of the information systems and the cost of service.

Companies such as Microsoft, Netscape, Oracle, Sun are most active in this field. Languages, standards, and systems such as Visual Basic, Java, JavaScript, VBScript, CGI, Perl, HTTP, HTML, Beans, ... are being employed. The creation of the infrastructure to support intranets is highly technical. The implementation of specific intranets using existing tools is fairly low tech.

ingargiola.cis.temple.edu