[System Services], [Fork], [Exit], [Wait], [Exec], [Files], [File Blocks], [Fun with Printf], [Fun with Fork] [Fork and Printf]
Here are some scattered notes about Unix. Your reference is the R. Stevens book (Advanced Programming in the Unix Environment).
The following figure (from Stevens page 168) describes the layout of the address space (virtual) of a process. This space is essentially in two parts, a top part with the stack, and a bottom part with the heap etc. Between the two is "empty" space, so either for the two parts we use two segments, or two separate page tables. The stack area is usually terminated by a guard, i.e. an area to which the program does not have access rights, so that an exception will be raised when running out of stack.
By the way, since the external variables etext, edata, end are respectively the beginning of the text, initialized data, and uninitialized data areas, and environ is the pointer to the environment variable structure, I have written the following program to see the layout. It resulted in the following output on my Digital workstation:
&etext = 120001610 &main = 1200011a0 &edata = 140000260 &end = 140000280 &a = 140000028 &b = 140000010 &j = 11ffffc48 &k = 11ffffc4c argv = 11ffffc68 *argv = 11ffffd48 environ = 11ffffc78 *environ= 11ffffd4e and the following output on my Linux; &etext = 0x804857e &main = 0x8048400 &edata = 0x804975c &end = 0x8049774 &a = 0x8048588 &b = 0x804965c &j = 0xbffff9d4 &k = 0xbffff9d0 argv = 0xbffffa44 *argv = 0xbffffb4e environ = 0xbffffa4c *environ= 0xbffffb56
A process, the parent, creates another process, the child, with the fork system service:
#include < sys/types.h > [usually these files are in /usr/include] #include < unistd.h > pid_t fork(void); It returns -1 failure to fork (we are still in the lonely parent) 0 we are in the child >0 the id of the child, we are in the parentA full copy of the address space of the parent is given to the child (you may have COW: Copy-On-Write, i.e. a page is copied only when it is necessary to differentiate its content in the parent's and in the child's address space - In Unix the command vmstat shows among other things the number of cow pages.). Exceptions are:
What happens to a child if the parent terminates before the child? Answer: it is given as parent the init process. Here are three standard processes and their process ids:
When a process terminates it informs its parent process of this fact with the SIGCHLD signal and passes to it status information. The parent process retrieves this information with the wait and waitpid system requests (or it indicates that it is uninterested in that information by specifying that the SIGCHLD signal should be ignored SIG_IGN). What happens between the time a child terminates and its parent gets the status info? Answer: The child is not allowed to fully terminate and it is said to be in a zombie state. Here is a program that shows how zombies can occur.
#include <stdlib.h> void exit (int status) #include <unistd.h> void _exit(int status) #include <stdlib.h> int atexit(void (*func)(void)); /* returns 0 iff OK */exit is called for normal termination. It is usually a function in the standard C library, not a system service. It invokes all the routines specified by atexit calls, cleans up the I/O , and calls _exit. The kernel supports a stack of up to 32 functions stacked by atexit. These functions at normal termination will be popped one by one from the stack and executed.
_exit when it is executed, it closes all open files, resets the
parent of its children to 1 (the init process),
current locks are released, semaphores are
reset, storage is released, if the parent is waiting it is notified
and the current process terminates, otherwise the current process remains
as a zombie (note process 1 is always available to do a wait, thus it causes
no zombies).
The following figure derived from page 164 in Stevens
shows how exit and _exit are related.
#include <sys/types.h> #include <sys/wait.h> pid_t wait (int *statloc); o pid_t will be the pid of the terminated child o statloc will be the status returned by the child o If there is a zombie, pick its info, terminating the zombie and continue; otherwise wait for termination of a child, when that happens, continue. pid_t waitpid (pid_t pid, int *statloc, int options); o pid is the pid of the process we are waiting for if pid is greater than 0. Otherwise it represents specific groups of acceptable processes o statloc and value are as for wait command o options can be WNOHANG if child is not there, return with 0 WUNTRACED used in systems with job control
There are a number of system services that we generically call exec services. Here is one such service:
#include <unistd.h> int execve(const char *pathname, char *const argv[], char *const envp[]); /* pathname identifies an executable file */ /* argv pointer to null-terminated array of pointers to null */ /* terminated character strings (first is name of called program)*/ /* envp pointer to null terminated array of pointers to null */ /* terminated character strings */ The program identified by pathname is executed in place of the current one as if it had been called in the usual way, i.e. as main(int argc, char *argv[], char *envp[]);All exec services replace the image of the calling process with a new image. In different exec services the new image is identified by an absolute/relative pathname, it receives all its parameters in a single array argv or as individual parameters, it receives information about setenv variables through the variable ENVIRON or directly as a parameter.
The process executing the exec command gives to the executed process information like:
One can think of a process executing an exec statement as of an actor
changing the script that it is acting.
An example of use of execvp appears in the section on files.
The following picture shows what happens when the Unix shell executes a user command (from Silberschatz, Peterson, and Gavin: Operating Systems Concepts)
Finally the following figure describes the way the init process (process 1) manages login of users .
In the diagram we specify the process name (i.e. the image) of each process and its process id. Init forks to have a separate process on each terminal line. Init as a child execs getty. Getty deals with the characteristics of the line (baud rate) and of the terminal attached to the line and execs the login image. Login requests and verifies account information and then execs the shell specified for the user in /etc/passwd. The init process, after waiting, once it recognises a terminated login process, it restarts the logging in sequence. Init also worries about the termination of non-terminal processes that have init as parent and makes sure they do not become zombies.
Here is a useful Standard C function:
#include <stdlib.h> int system(const char *cmdstring);which is equivalent to a fork, followed by an exec for the command "sh -c cmdstring" followed by waitpid, waiting for the termination of the forked process.
For example:
#include <stdlib.h> main(int argc, char **argv) { system("ls"); }
For information on files and I/O, the reference is Advanced Programming in the UNIX(R) Environment by W. Richard Stevens (a new edition is coming out in 2005). Page 777-799 in Computer Systems: A Programmer's Perspective by Randal E. Bryant, David R. O'Hallaron are also very useful.
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int oflag [, mode_t mode] ); It returns a file descriptor (a non negative integer) if successful; otherwise it returns -1. oflag is the OR of a number of flags such as O_RDONLY, O_WRONLY, O_RDWR (read only, write only, read+write) O_NONBLOCK(do not wait for completion - [normally we block]), O_SYNC (whether the operation is synchronous) O_CREAT, create the file if it does not exist O_EXCL, gives rise to an error (atomically) if also O_CREAT is specified and the file already exists. O_APPEND append from end of file if file already exists. mode specifies the rights (read/write/ .. for various users) in the case that the file is being created. It is the OR of flags such as S_IRWXU, read,write,execute permission for owner S_IRUSR, read permission for owner ... The rights of the created file will be determined on the basis of what we say when we create it, and what is specified by the umask at the time. Nemely the rights will be (~umask & mode) where umask is the current value of the file creation mask ('~' is bitwise complement and '&' is bitwise and). We will say more about umask when we talk of file protection. The file descriptor is an index for an entry in a table in process space. That entry contains some flags (one such flag will specify what to do with this open file in the case of an exec call, if to pass this open file to the new image or not) and a pointer to an open file object in system space. #include <unistd.h> int close (int filedescriptor); Here we close a file given its filedescriptor. It returns 0 iff OK, otherwise -1. #include <unistd.h> ssize_t read(int filedes, char *buffer, size_t nbytes); It reads from the specified file the specified number of bytes. It returns the number of bytes actually read (and moves the cursor by the same number. Read manual (or Stevens) for information in the case that we are reading from a pipe of from a locked file. #include <unistd.h> ssize_t write(int filedes, char *buffer, size_t nbytes); Similar to read, but now we move nbytes characters from buffer to filedes file. It returns the number of bytes actually transferred, or an error code. #include <unistd.h> off_t lseek (int filedes, off_t offset, int whence); whence specifies from where we count the seek movement SEEK_SET from the begining SEEK_CUR from the current position SEEK_END from the end of the file offset specifies how far we have to move the cursor. We can use the SEEK_CUR value, for example, to determine what is the current cursor position in a file. Say x is of type off_t and fd is the descriptor of a file we are currently using, then x = lseek(fd, 0, SEEK_CUR); will store in x the current cursor position in fd. #include <sys/types.h> #include <unistd.h> #include <fcntl.h> int fcntl(int filedes, int request [,int argument | struct flock *argument]); It has a number of roles, the main ones being to read/set the lock of a file or to read/set the mode of a file. Another role is to specify if files that are open should remain open across an exec call [this is the default behavior]. For example if fd is an open file then fcntl(fd, F_SETFD, 1); exec... will ensure that fd is not by default open in the executed program. fcntl is also used to change characterstics of a file, for instance change its use from blocking to non-blocking mode or viceversa. #include <sys/types.h> #include <unistd.h> #include <fcntl.h> int dup2(int old, int new); It creates a new file descriptor in the perprocess open file table. old is an open file descriptor in the perprocess open file table. new is a positive integer. If new denotes an open file equal to old, nothing happens. If new denotes an open file different from old, it is closed and then opened as pointing to the same system open file table entry as old. If new does not denote an open file, it is opened as pointing to the same system open file table entry as old. The return value is negative in case of error, the new file descriptor in case of success. For example, if fid denotes the file descriptor of a file opened for reading, we can read from this file as if it were the standard input by doing: if(fid != STDIN_FILENO){ if(dup2(fid,STDIN_FILENO) != STDIN_FILENO) { printf("Error\n"); exit(1); } close(fid); } /* read happily from the standard input what was referenced by fid. */ #include <unistd.h> #include <sys/ioctl.h> int ioctl(int filedescriptor, int request, void *arg); ioctl is used for all sorts of operations on files and devices. We don't do anything with it. It is here only as a reminder of where to look when trying to do something with a file and you do not know what else to do.Here is an example of use of open to create and share a file.
It is important to remember that we are here talking of the system service interface to the files. We are not talking of functions in the C standard library such as printf that operate on FILEs. However it is possible to move between Unix file descriptors and Standard C files. Here are the system calls that do the appropriate conversions:
#include < stdio.h> int fileno(File *stream);Given an open File pointed to by stream, returns the file descriptor associated with it. returns -1 in case of failure.
#include < stdio.h> FILE *fdopen(int filedes, const char *mode);Given an open file with descriptor filedes, and a mode string, formed with "r", or "w", or "+", it associates a stream to the file and returns it. In case of failure it returns the null pointer. Beware that the FILE* structure has buffering associated with it. Thus one may need to call fflush if one wants to be sure that what we write goes out now.
Here is an example that uses execvp, open, dup2.
The function stat is used to retrieve important properties of files.
#include <sys/types.h> #include <sys/stat.h> int stat(const char *path, struct stat *buff);The following is an example of use of the stat function and here is the stat structure as defined on my system (Digital Unix):
struct stat { dev_t st_dev; /* ID of device containing a directory*/ /* entry for this file. File serial*/ /* no + device ID uniquely identify */ /* the file within the system */ ino_t st_ino; /* File serial number */ mode_t st_mode; /* File mode; see #define's in */ /* sys/mode.h */ nlink_t st_nlink; /* Number of links */ uid_t st_uid; /* User ID of the file's owner */ gid_t st_gid; /* Group ID of the file's group */ dev_t st_rdev; /* ID of device */ /* This entry is defined only for */ /* character or block special files */ off_t st_size; /* File size in bytes */ /* Times measured in seconds since */ /* 00:00:00 GMT, Jan. 1, 1970 */ time_t st_atime; /* Time of last access */ int st_spare1; time_t st_mtime; /* Time of last data modification */ int st_spare2; time_t st_ctime; /* Time of last file status change */ int st_spare3; uint_t st_blksize; /* Size of block in file */ int st_blocks; /* blocks allocated for file */ uint_t st_flags; /* user defined flags for file */ uint_t st_gen; /* file generation number */ };
#include <sys/types.h> #include <dirent.h> DIR *opendir(const char *dirname); struct dirent *readdir(DIR *dirpointer); struct dirent { ino_t d_ino; // file number of entry ushort_t d_reclen; // length of this record ushort_t d-namlen; // length of string in d_name char d_name[256]; // name of entry };This use is shown in the following example.
The maximum number of entries in the file descriptor table can be determined with the system call getdtablesize. It will be at least 64 (on my Digital Unix it is 4096).
The buffer cache will have different kinds of buffers for different kinds of data. For block I/O one usually uses large blocks, say 8KB. For character oriented I/O one usually uses small blocks, say 64 bytes. Notice that write normally writes into buffers and then returns to the caller, i.e. the data is not written immediately to disk, that is the write is not synchronous [some write operations, say, of inode and directory information, are synchronous; in the words of Ousterhout, synchronous writes are one of the roots of bad performance in OSs]. The command sync() forces write out of all buffers, while fsync(filedes) only forces write out of a specific file.
Unix supports IO operations (open, close, read, write, fseek, ..). It
does IO buffering, but, as far as the user is concerned, orders are
immediately carried out.
C has standard IO operations (fopen, fclose, scanf, printf, ..). It also does
IO buffering (fflush to force write out).
Where Unix does buffering in the system space, C does buffering in the
user space. This can lead to some interestinvg behaviors. Here are three
programs, program 1, program 2, program 3, that differ on a single statement:
/* Program 1 */ /* Program 2 */ /* Program 3 */ int main(void){ int main(void){ int main(void){ printf("Roses..\n"); printf("Roses..\n");fflush(NULL); printf("Roses.."); write(1,"Violets"); write(1,"Violets"); write(1,"Violets"); exit(0);} exit(0);} exit(0);}If you run Program 1 on a terminal you get
Roses.. Violetsbecause "/n" directed to a terminal results in immediate flush of the program buffer. If you run Program 1 redirecting output to a disk file, you will find there:
Violets Roses..because "/n" directed to a disk does not results in immediate flush of the program buffer.
Roses.. Violetsno matter if output is to the terminal or to a disk file. And if you run Program 3 the output may be either
Roses.. Violetsor
Violets Roses..since the order will depend on scheduling.
main (){ int pid; int i; for (i=0; i<3; i++){ if ((pid=fork()) <0) { printf("Sorry, cannot fork\n"); } else if (pid == 0) { printf("child %d\n", i); } else { printf("parent %d\n", i);}} exit(0);}QUESTION: How many processes are involved, in total, in this program?
If you say four, the parent and the three children, you are wrong since each
child tries to continue the loop as its parent was doing.
If you say an infinite number, you are wrong since the child when it tries
to do its own first iteration, it has in i a value that is one greater than
in its parent. Thus, since i is limited by 3, iterating and forking will stop.
The correct answer is 8 (thanks to Barry Ortlip for the correct number and the explanation for this result):
P(0) | +-------------------------------+ P(1) C(0) | | +---------------+ +------------------+ P(2) C(1) P(1) C(1) | | | | *------+ +-------+ +-------+ +--------+ C(2) P(2) C(2) P(2) C(2) P(2) C(2)
By the way, if we run this program we get as output (not always you will get the lines in this order):
parent 0 child 0 parent 1 parent 1 child 1 parent 2 child 1 parent 2 child 2 parent 2 child 2 child 2 parent 2 child 2That is, we have 14 lines, of which 7 start with "parent" and 7 start with "child". Can you explain why this is so? How many lines would be written if we had 5, not 3, iteration? how many starting with "parent"? how many with "child"?
a.out > tempHere is its output:
parent 0 parent 1 parent 2 parent 0 parent 1 child 2 child 0 parent 1 parent 2 parent 0 child 1 child 2 parent 0 child 1 parent 2 child 0 child 1 parent 2 child 0 parent 1 child 2 child 0 child 1 child 2We find 24 lines, 12 starting with "parent", 12 starting with "child". We notice further that we have 4 of each "parent 0", "parent 1", "parent 2", "child 0", "child 1", "child 2".
QUESTION: What is happening? What would happen if we have 5 instead of 3 iterations?
Perhaps this diagram helps:
P(0)---------------------C(0) | | +----------+ +----------+ P(1) C(1) P(1) C(1) | | | | +-------+ +-------+ +-------+ +-------+ P(2) C(2) P(2) C(2) P(2) C(2) P(2) C(2)
ingargio@joda.cis.temple.edu