[Fork], [Exit], [Wait], [Exec], [Files], [File Blocks], [Fun with Printf], [Fun with Fork] [Fork and Printf]
Here are some scattered notes about Unix [version xyz???]. Your reference is the Stevens book (Advanced Programming in the Unix Environment). Read Tannenbaum chapter 7. A good introduction is also in the notes by Jan Newmarch at Canberra University.
The following figure (from Stevens page 168) describes the layout of the address space (virtual) of a process. This space is essentially in two parts, a top part with the stack, and a bottom part with the heap etc. Between the two is "empty" space, so either for the two parts we use two segments, or two separate page tables. The stack area is usually terminated by a guard, i.e. an area to which the program does not have access rights, so that an exception will be raised when running out of stack.
A process, the parent, creates another process, the child, with the fork system service:
#include < sys/types.h > [usually these files are in /usr/include] #include < unistd.h > pid_t fork(void); It returns -1 failure to fork (we are still in the lonely parent) 0 we are in the child >0 the id of the child, we are in the parentA full copy of the address space of the parent is given to the child (you may have COW: Copy-On-Write). Exceptions are:
What happens to a child if the parent terminates before the child? Answer: it is given as parent the init process. Here are three standard processes and their process ids:
When a process terminates it passes status information to its parent process. The parent process retrieves this information with the wait and waitpid system requests. What happens between the time a child terminates and its parent gets the status info? Answer: The child is not allowed to fully terminate and it is said to be in a zombie state.
#include <stdlib.h$gt; void exit (int status) #include <unistd.h> void _exit(int status) #include <stdlib.h> int atexit(void (*func)(void)); /* returns 0 iff OK */exit is called for normal termination. It is usually a function in the standard C library, not a system service. It invokes all the routines specified by atexit calls, cleans up the I/O , and calls _exit. The kernel supports a stack of up to 32 functions stacked by atexit. These functions at normal termination will be popped one by one from the stack and executed.
_exit when it is executed, it closes all open files, resets the
parent of its children to 1, current locks are released, semaphores are
reset, storage is released, if the parent is waiting it is notified
and the current process terminates, otherwise the current process remains
as a zombie (note process 1 is always available to do a wait, thus it causes
no zombies).
The following figure derived from page 164 in Stevens
shows how exit and _exit are related.
#include <sys/types.h> #include <sys/wait.h> pid_t wait (int *statloc); o pid_t will be the pid of the terminated child o statloc will be the status returned by the child o If there is a zombie, pick its info, terminating the zombie and continue; otherwise wait for termination of a child, when that happens, continue. pid_t waitpid (pid_t pid, int *statloc, int options); o pid is the pid of the process we are waiting for if pid is greater than 0. Otherwise it represents specific groups of acceptable processes o statloc and value are as for wait command o options can be WNOHANG if child is not there, return with 0 WUNTRACED used in systems with job control
There are a number of system services that we generically call exec services. Here is one such service:
#include <unistd.h> int execve(const char *pathname, char *const argv[], char *const envp[]); /* pathname identifies an executable file */ /* argv pointer to null-terminated array of pointers to null */ /* terminated character strings (first is name of called program*/ /* envp pointer to null terminated array of pointers to null */ /* terminated character strings */ The program identified by pathname is executed in place of the current one as if it had been called in the usual way, i.e. as main(int argc, char *argv[] char *envp[]);All exec services replace the image of the calling process with a new image. In different exec services the new image is identified by an absolute/relative pathname, it receives all its parameters in a single array argv or as individual parameters, it receives information about setenv variables through the variable ENVIRON or directly as a parameter.
The process executing the exec command gives to the executed process information like:
One can think of a process executing an exec statement as of an actor changing the script that it is acting.
The following picture shows what happens when the Unix shell executes a user command (from Silberschatz, Peterson, and Gavin: Operating Systems Concepts)
Finally the following figure describes the way the init process (process 1) manages login of users .
In the diagram we specify the process name (i.e. the image) of each process and its process id. Init forks to have a separate process on each terminal line. Init as a child writes a login request on the terminal line and the executes getty. Getty requests and verifies account information and then executes the shell specified for the user in /etc/passwd. The init process, after waiting, once it recognises a terminated login process, it restarts the logging in sequence. Of course init also worries about the termination of non-terminal processes that have init as parent and makes sure they do not become zombies.
Here is a useful Standard C function:
#include <stdio.h> int system(const char *cmdstring);which is equivalent to a fork, followed by an exec for the command "sh -c cmdstring" followed by waitpid, waiting for the termination of the forked process.
For example:
#include#define MAXSIZE 256 main(argc, argv) int argc; char **argv; { system("ls"); }
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int oflag [, mode_t mode] ); It returns a file descriptor (a non negative integer) if successful; otherwise it returns -1. oflag is the OR of a number of flags such as O_RDONLY, O_WRONLY, O_RDWR (read only, write only, read+write) O_NONBLOCK(do not wait for completion), O_SYNC (whether the operation is synchronous) O_CREAT, create the file if it does not exist O_EXCL, gives rise to an error (atomically) if also O_CREAT is specified and the file already exists. [DOES THIS GIVE YOU ANY IDEAS?] O_APPEND append from end of file if file already exists. mode specifies the rights (read/write/ .. for various users) in the case that the file is being created. It is the OR of flags such as S_IRWXU, read,write,execute permission for owner S_IRUSR, read permission for owner ... The file descriptor is an index for an entry in a table in process space. That entry contains some flags (one such flag will specify what to do with this open file in the case of an exec call, if to pass it to the new image or not) and a pointer to an open file object in system space. #include <unistd.h> int close (int filedescriptor); Here we close a file given its filedescriptor. It returns 0 iff OK, otherwise -1. #include <unistd.h> ssize_t read(int filedes, char *buffer size_t nbytes); It reads from the specified file the specified number of bytes. It returns the number of bytes actually read (and moves the cursor by the same number. Read manual (or Stevens) for information in the case that we are reading from a pipe of from a locked file. #include <unistd.h> ssize_t write(int filedes, char *buffer size_t nbytes); #include <unistd.h> off_t lseek (int filedes, off_t offset, int whence); whence specifies from where we count the seek movement SEEK_SET from the begining SEEK_CUR from the current position SEEK_END from the end of the file offset specifies how far we have to move the cursor. #include <sys/types.h> #include <unistd.h> #include <fcntl.h> int fcntl(int filedes, int request [int argument | struct flock *argument]); It has a number of roles, the main ones being to read/set the lock of a file or to read/set the mode of a file. #include <sys/types.h> #include <unistd.h> #include <fcntl.h> int dup2(int old, int new); It creates a new file descriptor in the perprocess open file table. old is an open file descriptor in the perprocess open file table. new is a positive integer. If new denotes an open file equal to old, nothing happens. If new denotes an open file different from old, it is closed and then opened as pointing to the same system open file table entry as old. If new does not denote an open file, it is opened as pointing to the same system open file table entry as old. For example, if fid denotes the file descriptor of a file opened for reading, we can read from this file as if it were the standard input by doing: if(fid != STDIN_FILENO){ if(dup2(fid,STDIN_FILENO) != STDIN_FILENO) printf("Error\n"); close(fid); } /* read happily from the standard input */ #include <unistd.h> #include <sys/ioctl.h> int ioctl(int filedescriptor, int request, void *arg); ioctl is used for all sorts of operations on files and devices. We don't do anything with it. It is here only as a reminder of where to look when trying to do something with a file and you do not know what else to do.It is important to remember that we are here talking of the system service interface to the files. We are not talking of functions in the C standard library such as printf that operate on FILEs.
The buffer cache will have different kinds of buffers for different kinds of data. For block I/O one usually uses large blocks, say 8KB. For character oriented I/O one usually uses small blocks, say 64 bytes. Notice that write normally writes into buffers and then returns to the caller, i.e. the data is not written immediately to disk, that is the write is not synchronous [some write operations, say, of inode and directory information, are synchronous; in the words of Ousterhout, synchronous writes are one of the roots of bad performance in OSs]. The command sync() forces writes of all buffers, while fsync(filedes) only forces write out of a specific file.
Unix supports IO operations (open, close, read, write, fseek, ..). It
does IO buffering, but, as far as the user is concerned, orders are
immediately carried out.
C has standard IO operations (fopen, fclose, scanf, printf, ..). It also does
IO buffering (fflush to force write out).
Where Unix does buffering in the system space, C does buffering in the
user space. This can lead to some interestinvg behaviors. Here are three
programs, program 1, program 2, program 3, that differ on a single statement:
/* Program 1 */ /* Program 2 */ /* Program 3 */ int main(void){ int main(void){ int main(void){ printf("Roses..\n"); printf("Roses..\n");fflush(); printf("Roses.."); write(1,"Violets"); write(1,"Violets"); write(1,"Violets"); exit(0);} exit(0);} exit(0);}If you run Program 1 on a terminal you get
Roses.. Violetsbecause "/n" directed to a terminal results in immediate flush of the program buffer. If you run Program 1 redirecting output to a disk file, you will find there:
Violets Roses..because "/n" directed to a disk does not results in immediate flush of the program buffer.
Roses.. Violetsno matter if output is to the terminal or to a disk file. And if you run Program 3 the output may be either
Roses.. Violetsor
Violets Roses..since the order will depend on scheduling.
main (){ int pid; int i; for (i=0; i<3; i++){ if ((pid=fork()) <0) { printf("Sorry, cannot fork\n"); } else if (pid == 0) { printf("child %d\n", i); } else { printf("parent %d\n", i);}} exit(0);}QUESTION: How many processes are involved, in total, in this program?
If you say four, the parent and the three children, you are wrong since each
child tries to continue the loop as its parent was doing.
If you say an infinite number, you are wrong since the child when it tries
to do its own first iteration, it has in i a value that is one greater than
in its parent. Thus, since i is limited by 3, iterating and forking will stop.
The correct answer is 8 (thanks to Barry Ortlip for the correct number and the explanation for this result):
P(0) | +-------------------------------+ P(1) C(0) | | +---------------+ +------------------+ P(2) C(1) P(1) C(1) | | | | *------+ +-------+ +-------+ +--------+ C(2) P(2) C(2) P(2) C(2) P(2) C(2)
By the way, if we run this program we get as output (not always you will get the lines in this order):
parent 0 child 0 parent 1 parent 1 child 1 parent 2 child 1 parent 2 child 2 parent 2 child 2 child 2 parent 2 child 2That is, we have 14 lines, of which 7 start with "parent" and 7 start with "child". Can you explain why this is so? How many lines would be written if we had 5, not 3, iteration? how many starting with "parent"? how many with "child"?
a.out > tempHere is its output:
parent 0 parent 1 parent 2 parent 0 parent 1 child 2 child 0 parent 1 parent 2 parent 0 child 1 child 2 parent 0 child 1 parent 2 child 0 child 1 parent 2 child 0 parent 1 child 2 child 0 child 1 child 2We find 24 lines, 12 starting with "parent", 12 starting with "child". We notice further that we have 4 of each "parent 0", "parent 1", "parent 2", "child 0", "child 1", "child 2".
QUESTION: What is happening? What would happen if we have 5 instead of 3 iterations?
Perhaps this diagram helps:
P(0)---------------------C(0) | | +----------+ +----------+ P(1) C(1) P(1) C(1) | | | | +-------+ +-------+ +-------+ +-------+ P(2) C(2) P(2) C(2) P(2) C(2) P(2) C(2)
ingargiola.cis.temple.edu