As you delve into the mysteries of UNIX, you find more and more things that are difficult to understand immediately. One of these things, at least for most people, is the BSD socket concept. This is a short tutorial that explains what they are, how they work, and gives sample code showing how to use them.
The socket is the BSD method for accomplishing interprocess communication (IPC). What this means is a socket is used to allow one process to speak to another, very much like the telephone is used to allow one person to speak to another.
The telephone analogy is a very good one, and will be used repeatedly to describe socket behavior.
In order for a person to receive telephone calls, he must first
have a telephone installed. Likewise you must create a socket to
listen for connections. This process involves several steps. First
you must make a new socket, which is similar to having a telephone
line installed. The socket()
command is used to do this.
Since sockets can have several types, you must specify what type of socket you want when you create one. One option that you have is the addressing format of a socket. Just as the mail service uses a different scheme to deliver mail than the telephone company uses to complete calls, so can sockets differ. The two most common addressing schemes are AF_UNIX and IAF_INET. AF_UNIX addressing uses UNIX pathnames to identify sockets; these sockets are very useful for IPC between processes on the same machine. AF_INET addressing uses Internet addresses which are four-byte numbers usually written as four decimal numbers separated by periods (such as 192.9.200.10). In addition to the machine address, there is also a port number which allows more than one AF_INET socket on each machine. AF_INET addresses are what we will deal with here, as they are the most useful and widely used.
Another option which you must supply when creating a socket is the type of socket. The two most common types are SOCK_STREAM and SOCK_DGRAM. SOCK_STREAM indicates that data will come across the socket as a stream of characters, while SOCK_DGRAM indicates that data will come in bunches (called datagrams). We will be dealing with SOCK_STREAM sockets, which are the most common and easiest to use.
After creating a socket, we must give the socket an address to
listen to, just as you get a telephone number so that you can receive
calls. The bind()
function is used to do this (it binds a
socket to an address, hence the name).
SOCK_STREAM type sockets have the ability to queue incoming
connection requests, which is a lot like having "call waiting" for
your telephone. If you are busy handling a connection, the connection
request will wait until you can deal with it. The
listen()
function is used to set the maximum number of
requests (up to a maximum of five, usually) that will be queued before
requests start being denied. While it is not necessary to use the
listen()
function, it's good practice.
The following function shows how to use the socket()
,
bind()
, and listen()
functions to establish
a socket which can accept calls:
After you create a socket to get calls, you must wait for calls to
that socket. The accept()
function is used to do this.
Calling accept()
is analogous to picking up the telephone
if it's ringing. Accept()
returns a new socket which is
connected to the caller.
The following function can be used to accept a connection on a socket that has been created using the establish() function above:
Unlike with the telephone, you may still accept calls while
processing previous connections. For this reason you usually fork off
jobs to handle each connection. The following code shows how to use
establish()
and get_connection()
to allow
multiple connections to be dealt with:
You now know how to create a socket that will accept incoming
calls. So how do you call it? As with the telephone, you must first
have the phone before using it to call. You use the
socket()
function to do this, exactly as you establish a
socket to listen to.
After getting a socket to make the call with, and giving it an
address, you use the connect()
function to try to connect
to a listening socket. The following function calls a particular port
number on a particular host:
This function returns a connected socket through which data can flow.
Now that you have a connection between sockets you want to send
data between them. The read()
and write()
functions are used to do this, just as they are for normal files.
There is only one major difference between socket reading and writing
and file reading and writing: you don't usually get back the same
number of characters that you asked for, so you must loop until you
have read the number of characters that you want. A simple function
to read a given number of characters into a buffer is:
A very similar function should be used to write data; we leave that function as an exercise to the reader.
Just as you hang up when you're through speaking to someone over
the telephone, so must you close a connection between sockets. The
normal close()
function is used to close each end of a
socket connection. If one end of a socket is closed and the other
tries to write to its end, the write will return an error.
Now that you can talk between machines, you have to be careful what you say. Many machines use differing dialects, such as ASCII versus (yech) EBCDIC. More commonly there are byte-order problems. Unless you always pass text, you'll run up against the byte-order problem. Luckily people have already figured out what to do about it.
Once upon a time in the dark ages someone decided which byte order
was "right". Now there exist functions that convert one to the other
if necessary. Some of these functions are htons()
(host
to network short integer), ntohs()
(network to host short
integer), htonl()
(host to network long integer), and
ntohl()
(network to host long integer). Before sending
an integer through a socket, you should first massage it with the
htonl()
function:
and after reading data you should convert it back with
ntohl()
:
If you keep in the habit of using these functions you'll be less likely to goof it up in those circumstances where it is necessary.
Using just what's been discussed here, you should be able to build your own programs that communicate with sockets. As with all new things, however, it would be a good idea to look at what's already been done. While there are not a lot of books describing BSD sockets, one good reference is Unix Network Programming by W. Richard Stevens (Prentice-Hall 1990, ISBN 0-13-949876-1). In addition, you should look at some of the many public-domain applications which make use of sockets, since real applications are the best teachers. One such application is available by anonymous ftp from ftp.std.com in /src/network/msend.1.2.tar.gz.
Beware that the examples given here leave out a lot of error checking which should be used in a real application. You should check the manual pages for each of the functions discussed here for further information. If you have specific questions regarding sockets, please feel free to ask me at email address jimf@world.std.com.