CIS 4307: Unix I

[System Services], [Fork], [Exit], [Wait], [Exec], [Files], [File Blocks], [Fun with Printf], [Fun with Fork] [Fork and Printf]

Here are some scattered notes about Unix. Your reference is the R. Stevens book (Advanced Programming in the Unix Environment).

The following figure (from Stevens page 168) describes the layout of the address space (virtual) of a process. This space is essentially in two parts, a top part with the stack, and a bottom part with the heap etc. Between the two is "empty" space, so either for the two parts we use two segments, or two separate page tables. The stack area is usually terminated by a guard, i.e. an area to which the program does not have access rights, so that an exception will be raised when running out of stack.

By the way, since the external variables etext, edata, end are respectively the beginning of the text, initialized data, and uninitialized data areas, and environ is the pointer to the environment variable structure, I have written the following program to see the layout. It resulted in the following output on my Digital workstation:

        &etext  =       120001610
        &main   =       1200011a0
        &edata  =       140000260
        &end    =       140000280
        &a      =       140000028
        &b      =       140000010
        &j      =       11ffffc48
        &k      =       11ffffc4c
        argv    =       11ffffc68
        *argv   =       11ffffd48
        environ =       11ffffc78
        *environ=       11ffffd4e
and the following output on my Linux;
        &etext  =       0x804857e
        &main   =       0x8048400
        &edata  =       0x804975c
        &end    =       0x8049774
        &a      =       0x8048588
        &b      =       0x804965c
        &j      =       0xbffff9d4
        &k      =       0xbffff9d0
        argv    =       0xbffffa44
        *argv   =       0xbffffb4e
        environ =       0xbffffa4c
        *environ=       0xbffffb56

System Services

File Management
open, creat, close, lseek, read, write, dup, dup2, fcntl, ioctl, sync, fsync, ...
IO Multiplexing
select
Process Management
exec, fork, vfork, exit, wait, waitpd, system, ..
Information Services
getpid, getppid, getuid, geteuid, getgid, getegid, getrlimit, ..
Inter Process Communication
Pipes
pipe
Message Queues
mq_open, mq_send, mq_receive, ..
Memory Mapped IO
mmap, munmap
Shared Memory
shmget, shmat, shmdt, ..
Signals
signal, sigaction, kill, alarm, sleep, pause, ..
Sockets
socket, gethostbyname, getsockname, bind, listen, accept, htons, sendto, recvfrom, ..

FORK

A process, the parent, creates another process, the child, with the fork system service:

    #include < sys/types.h >	[usually these files are in /usr/include]
    #include < unistd.h >		 

    pid_t fork(void);

    It returns	-1  failure to fork (we are still in the lonely parent)
		0   we are in the child
		>0  the id of the child, we are in the parent
A full copy of the address space of the parent is given to the child (you may have COW: Copy-On-Write, i.e. a page is copied only when it is necessary to differentiate its content in the parent's and in the child's address space - In Unix the command vmstat shows among other things the number of cow pages.). Exceptions are: The command vfork is used when we fork and in the child we immediately do an EXEC. The child keeps the same address space as the parent until the EXEC is executed.

What happens to a child if the parent terminates before the child? Answer: it is given as parent the init process. Here are three standard processes and their process ids:

When a process terminates it informs its parent process of this fact with the SIGCHLD signal and passes to it status information. The parent process retrieves this information with the wait and waitpid system requests (or it indicates that it is uninterested in that information by specifying that the SIGCHLD signal should be ignored SIG_IGN). What happens between the time a child terminates and its parent gets the status info? Answer: The child is not allowed to fully terminate and it is said to be in a zombie state. Here is a program that shows how zombies can occur.

Exit, _exit, atexit

    #include <stdlib.h>
    void exit (int status)

    #include <unistd.h>
    void _exit(int status)

    #include <stdlib.h>
    int atexit(void (*func)(void));   /* returns 0 iff OK */
exit is called for normal termination. It is usually a function in the standard C library, not a system service. It invokes all the routines specified by atexit calls, cleans up the I/O , and calls _exit. The kernel supports a stack of up to 32 functions stacked by atexit. These functions at normal termination will be popped one by one from the stack and executed.

_exit when it is executed, it closes all open files, resets the parent of its children to 1 (the init process), current locks are released, semaphores are reset, storage is released, if the parent is waiting it is notified and the current process terminates, otherwise the current process remains as a zombie (note process 1 is always available to do a wait, thus it causes no zombies).
The following figure derived from page 164 in Stevens shows how exit and _exit are related.

Wait, waitpid

    #include <sys/types.h>
    #include <sys/wait.h>
    pid_t wait (int *statloc);
		o pid_t will be the pid of the terminated child
		o statloc will be the status returned by the child
		o If there is a zombie, pick its info, terminating the 
		  zombie and continue; otherwise wait for termination
		  of a child, when that happens, continue.
    pid_t waitpid (pid_t pid, int *statloc, int options);
		o pid is the pid of the process we are waiting for
		  if pid is greater than 0. Otherwise it represents
		  specific groups of acceptable processes
		o statloc and value are as for wait command
		o options can be 
		  WNOHANG if child is not there, return with 0
		  WUNTRACED used in systems with job control

Exec

There are a number of system services that we generically call exec services. Here is one such service:

    #include <unistd.h>
    int execve(const char *pathname, char *const  argv[], char *const envp[]);
	/* pathname  identifies an executable file */
	/* argv      pointer to null-terminated array of pointers to null */
        /*           terminated character strings (first is name of called program)*/
	/* envp      pointer to null terminated array of pointers to null */
        /*           terminated character strings */

    The program identified by pathname is executed in place of the current one
    as if it had been called in the usual way, i.e. as

    main(int argc, char *argv[], char *envp[]);
All exec services replace the image of the calling process with a new image. In different exec services the new image is identified by an absolute/relative pathname, it receives all its parameters in a single array argv or as individual parameters, it receives information about setenv variables through the variable ENVIRON or directly as a parameter.

The process executing the exec command gives to the executed process information like:

One can think of a process executing an exec statement as of an actor changing the script that it is acting.
An example of use of execvp appears in the section on files.

The following picture shows what happens when the Unix shell executes a user command (from Silberschatz, Peterson, and Gavin: Operating Systems Concepts)

Finally the following figure describes the way the init process (process 1) manages login of users .

In the diagram we specify the process name (i.e. the image) of each process and its process id. Init forks to have a separate process on each terminal line. Init as a child execs getty. Getty deals with the characteristics of the line (baud rate) and of the terminal attached to the line and execs the login image. Login requests and verifies account information and then execs the shell specified for the user in /etc/passwd. The init process, after waiting, once it recognises a terminated login process, it restarts the logging in sequence. Init also worries about the termination of non-terminal processes that have init as parent and makes sure they do not become zombies.

Here is a useful Standard C function:

    #include <stdlib.h>
    int system(const char *cmdstring);
which is equivalent to a fork, followed by an exec for the command "sh -c cmdstring" followed by waitpid, waiting for the termination of the forked process.

For example:

  #include <stdlib.h>

  main(int argc, char **argv)
  {
    system("ls");
  }

Files

For information on files and I/O, the reference is Advanced Programming in the UNIX(R) Environment by W. Richard Stevens (a new edition is coming out in 2005). Page 777-799 in Computer Systems: A Programmer's Perspective by Randal E. Bryant, David R. O'Hallaron are also very useful.

    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    int open(const char *pathname, int oflag [, mode_t mode] );

    It returns a file descriptor (a non negative integer) if successful;
    otherwise it returns -1.
    oflag is the OR of a number of flags such as 
	O_RDONLY, O_WRONLY, O_RDWR (read only, write only, read+write)
	O_NONBLOCK(do not wait for completion - [normally we block]), 
	O_SYNC (whether the operation is synchronous)
	O_CREAT, create the file if it does not exist
	O_EXCL, gives rise to an error (atomically) if also O_CREAT is 
		specified and the file already exists. 
	O_APPEND append from end of file if file already exists.
    mode specifies the rights (read/write/ .. for various users) in the case
    that the file is being created. It is the OR of flags such as
	S_IRWXU, read,write,execute permission for owner
	S_IRUSR, read permission for owner ...
    The rights of the created file will be determined on the basis of what
    we say when we create it, and what is specified by the umask at 
    the time. Nemely the rights will be (~umask & mode) where umask is the
    current value of the file creation mask ('~' is bitwise complement
    and '&' is bitwise and).
    We will say more about umask when we talk of file protection.
    The file descriptor is an index for an entry in a table in process space.
    That entry contains some flags (one such flag will specify what
    to do with this open file in the case of an exec call, if to pass
    this open file to the new image or not) and a pointer to an open 
    file object in system space.

    #include <unistd.h>
    int close (int filedescriptor);

    Here we close a file given its filedescriptor. It returns 0 iff OK,
    otherwise -1.

    #include <unistd.h>
    ssize_t read(int filedes, char *buffer, size_t nbytes);

    It reads from the specified file the specified number of bytes. It returns
    the number of bytes actually read (and moves the cursor by the same number.
    Read manual (or Stevens) for information in the case that we are reading
    from a pipe of from a locked file.

    #include <unistd.h>
    ssize_t write(int filedes, char *buffer, size_t nbytes);

    Similar to read, but now we move nbytes characters from buffer
    to filedes file. It returns the number of bytes actually
    transferred, or an error code.

    #include <unistd.h>
    off_t lseek (int filedes, off_t offset, int whence);

    whence specifies from where we count the seek movement
      SEEK_SET   from the begining
      SEEK_CUR   from the current position
      SEEK_END   from the end of the file
    offset specifies how far we have to move the cursor.  We can use 
    the SEEK_CUR value, for example, to determine what is the current
    cursor position in a file. Say x is of type off_t and fd is the
    descriptor of a file we are currently using, then
	x = lseek(fd, 0, SEEK_CUR);
    will store in x the current cursor position in fd.
	
    
    #include <sys/types.h>
    #include <unistd.h>
    #include <fcntl.h>
    int fcntl(int filedes, int request [,int argument | struct flock *argument]);

    It has a number of roles, the main ones being to read/set the lock of a 
    file or to read/set the mode of a file. Another role is to specify if
    files that are open should remain open across an exec call [this is
    the default behavior]. For example if fd is an open file then
	fcntl(fd, F_SETFD, 1);
	exec...
    will ensure that fd is not by default open in the executed program.
    fcntl is also used to change characterstics of a file, for instance
    change its use from blocking to non-blocking mode or viceversa.

    #include <sys/types.h>
    #include <unistd.h>
    #include <fcntl.h>
    int dup2(int old, int new);

    It creates a new file descriptor in the perprocess open file table.
    old is an open file descriptor in the perprocess open file table.
    new is a positive integer. If new denotes an open file equal to
    old, nothing happens. If new denotes an open file different from old,
    it is closed and then opened as pointing to the same system open file
    table entry as old. If new does not denote an open file, it is opened as
    pointing to the same system open file table entry as old. The return 
    value is negative in case of error, the new file descriptor in case 
    of success.

    For example, if fid denotes the file descriptor of a file opened for
    reading, we can read from this file as if it were the standard input
    by doing:

	if(fid != STDIN_FILENO){
	    if(dup2(fid,STDIN_FILENO) != STDIN_FILENO) {
		printf("Error\n");
                exit(1);
            }
	    close(fid);
	}
	/* read happily from the standard input what was referenced
           by fid. */

    #include <unistd.h>
    #include <sys/ioctl.h>

    int ioctl(int filedescriptor, int request, void *arg);

    ioctl is used for all sorts of operations on files and devices. 
    We don't do anything with it. It is here only as a reminder of
    where to look when trying to do something with a file and you
    do not know what else to do.
Here is an example of use of open to create and share a file.

It is important to remember that we are here talking of the system service interface to the files. We are not talking of functions in the C standard library such as printf that operate on FILEs. However it is possible to move between Unix file descriptors and Standard C files. Here are the system calls that do the appropriate conversions:

  #include < stdio.h>
  int fileno(File *stream);
Given an open File pointed to by stream, returns the file descriptor associated with it. returns -1 in case of failure.
  #include < stdio.h>
  FILE *fdopen(int filedes, const char *mode);
Given an open file with descriptor filedes, and a mode string, formed with "r", or "w", or "+", it associates a stream to the file and returns it. In case of failure it returns the null pointer. Beware that the FILE* structure has buffering associated with it. Thus one may need to call fflush if one wants to be sure that what we write goes out now.

Here is an example that uses execvp, open, dup2.

The function stat is used to retrieve important properties of files.

     #include <sys/types.h>
     #include <sys/stat.h>

     int stat(const char *path, struct stat *buff);
The following is an example of use of the stat function and here is the stat structure as defined on my system (Digital Unix):
    struct  stat
    {
   	dev_t	st_dev;		/* ID of device containing a directory*/
				/*   entry for this file.  File serial*/
				/*   no + device ID uniquely identify */
				/*   the file within the system */
	ino_t	st_ino;		/* File serial number */
	mode_t	st_mode;	/* File mode; see #define's in */
				/*   sys/mode.h */
	nlink_t	st_nlink;	/* Number of links */
	uid_t	st_uid;		/* User ID of the file's owner */
	gid_t	st_gid;		/* Group ID of the file's group */
	dev_t	st_rdev;	/* ID of device */
				/*   This entry is defined only for */
				/*   character or block special files */
	off_t	st_size;	/* File size in bytes */

				/* Times measured in seconds since */
				/*   00:00:00 GMT, Jan. 1, 1970 */
	time_t	st_atime;	/* Time of last access */
	int	st_spare1;
	time_t	st_mtime;	/* Time of last data modification */
	int	st_spare2;
	time_t	st_ctime;	/* Time of last file status change */
	int	st_spare3;

	uint_t	st_blksize;	/* Size of block in file */
        int     st_blocks;      /* blocks allocated for file */

        uint_t  st_flags;       /* user defined flags for file */
        uint_t  st_gen;         /* file generation number */
    };

The structure dirent and the functions opendir and readdir can be used to find the files included in directories.
     #include <sys/types.h>
     #include <dirent.h>

     DIR *opendir(const char *dirname);
     struct dirent *readdir(DIR *dirpointer);

     struct dirent {
       ino_t  d_ino;      // file number of entry
       ushort_t d_reclen; // length of this record
       ushort_t d-namlen; // length of string in d_name
       char d_name[256];  // name of entry
     };
This use is shown in the following example.

Control Blocks for files

When a process forks its child shares with the parent its open files. That is, files that are open in the parent are also open in the child. Further, the parent and the child share the cursor on the file. Thus if the file was open for reading, what is read by the parent is not read by the child and viceversa. On the other hand, if a file is opened in the parent after the fork and is also opened in the child, then the two processes do not share cursors. The situation is described in the following figure.

The maximum number of entries in the file descriptor table can be determined with the system call getdtablesize. It will be at least 64 (on my Digital Unix it is 4096).

The buffer cache will have different kinds of buffers for different kinds of data. For block I/O one usually uses large blocks, say 8KB. For character oriented I/O one usually uses small blocks, say 64 bytes. Notice that write normally writes into buffers and then returns to the caller, i.e. the data is not written immediately to disk, that is the write is not synchronous [some write operations, say, of inode and directory information, are synchronous; in the words of Ousterhout, synchronous writes are one of the roots of bad performance in OSs]. The command sync() forces write out of all buffers, while fsync(filedes) only forces write out of a specific file.

Funny going ons with Printf

Unix supports IO operations (open, close, read, write, fseek, ..). It does IO buffering, but, as far as the user is concerned, orders are immediately carried out.
C has standard IO operations (fopen, fclose, scanf, printf, ..). It also does IO buffering (fflush to force write out).
Where Unix does buffering in the system space, C does buffering in the user space. This can lead to some interestinvg behaviors. Here are three programs, program 1, program 2, program 3, that differ on a single statement:

/* Program 1 */      /* Program 2 */               /* Program 3 */
int main(void){      int main(void){               int main(void){
printf("Roses..\n"); printf("Roses..\n");fflush(NULL); printf("Roses..");
write(1,"Violets");  write(1,"Violets");	   write(1,"Violets");
exit(0);}            exit(0);}                     exit(0);}
If you run Program 1 on a terminal you get
	Roses..
	Violets
because "/n" directed to a terminal results in immediate flush of the program buffer. If you run Program 1 redirecting output to a disk file, you will find there:
	Violets
	Roses..
because "/n" directed to a disk does not results in immediate flush of the program buffer.
If you run Program 2, it will print
	Roses..
	Violets
no matter if output is to the terminal or to a disk file. And if you run Program 3 the output may be either
	Roses..
	Violets
or
	Violets
	Roses..
since the order will depend on scheduling.

Funny going ons with Fork

Here is a simple C program using Fork.
main (){
  int pid;
  int i;
  for (i=0; i<3; i++){
  if ((pid=fork()) <0) {
    printf("Sorry, cannot fork\n");
  } else if (pid == 0) {
    printf("child %d\n", i);
  } else {
    printf("parent %d\n", i);}}
  exit(0);}
QUESTION: How many processes are involved, in total, in this program?

If you say four, the parent and the three children, you are wrong since each child tries to continue the loop as its parent was doing.
If you say an infinite number, you are wrong since the child when it tries to do its own first iteration, it has in i a value that is one greater than in its parent. Thus, since i is limited by 3, iterating and forking will stop.

The correct answer is 8 (thanks to Barry Ortlip for the correct number and the explanation for this result):

QUESTION: If instead of having 3 in the loop we had 5, how many processes we would have in total?

By the way, if we run this program we get as output (not always you will get the lines in this order):

	parent 0
	child 0
	parent 1
	parent 1
	child 1
	parent 2
	child 1
	parent 2
	child 2
	parent 2
	child 2
	child 2
	parent 2
	child 2
That is, we have 14 lines, of which 7 start with "parent" and 7 start with "child". Can you explain why this is so? How many lines would be written if we had 5, not 3, iteration? how many starting with "parent"? how many with "child"?

Real confusion when Fork and Printf get together

I compiled the fork program to a.out and gave the command:
	a.out > temp
Here is its output:
	parent 0
	parent 1
	parent 2
	parent 0
	parent 1
	child 2
	child 0
	parent 1
	parent 2
	parent 0
	child 1
	child 2
	parent 0
	child 1
	parent 2
	child 0
	child 1
	parent 2
	child 0
	parent 1
	child 2
	child 0
	child 1
	child 2
We find 24 lines, 12 starting with "parent", 12 starting with "child". We notice further that we have 4 of each "parent 0", "parent 1", "parent 2", "child 0", "child 1", "child 2".

QUESTION: What is happening? What would happen if we have 5 instead of 3 iterations?

Perhaps this diagram helps:

		P(0)---------------------C(0)
		|                         |
	  +----------+              +----------+
	 P(1)       C(1)           P(1)       C(1)
	 |           |              |          |
     +-------+   +-------+      +-------+   +-------+
    P(2)    C(2) P(2)   C(2)   P(2)    C(2) P(2)   C(2)

ingargio@joda.cis.temple.edu