CIS4307 - Homeworks 2: Thread Pool and Protected Bounded Buffers

Given: September 22, 2009
Due Date: October 12, by 10pm

A common problem is to determine if a file (or a network message) has been modified. Certain functions, called hash functions, are used to generate from the content of a file a small "digest". Examples of such functions are MD5 and SHA1. We will use MD5 which produces from an array of characters a digest consisting of 16 bytes of binary data. Any change in the array of characters will result in a different digest (almost with certainty). Here is an example of use of MD5: you are given a file; you compute the MD5 digest of its content and save that digest. When later on you want to determine if the content of the file has been modified, you recompute its digest. If the new and old digests are equal, the content of the file has not been changed, otherwise it has [of course the old digest must have been preserved in a reliable store].
You can use any C implementation of MD5 that you find. Using such code you should implement the following two functions:

    /* Given an array of count bytes in buffer, stores in thecode its MD5
       digest. It returns 0 iff successful
    */
    int MD5block(const unsigned char *buffer, int count, unsigned char thecode[16]);

    /* Given the name of an ordinary file, stores in thecode the MD5 digest of 
       its content. It returns 0 iff successful. In the implementation of this 
       function you should use memory mapped io 
    */
    int MD5file(const char *filename, unsigned char thecode[16]);

You are to write a programs lab2 that will be given as command line parameter the name of a file, for example "/usr", and it will produce a file "digests.txt" that will consist of lines, one for each of the ordinary files and directories that are readable by any user and are reacheable from "/usr" without following links, of the form

    filename digest
where the filename is the full path from the root and the MD5 digest is in hexadecimal. The MD5 of an ordinary file is computed in the obvious way. In the case of an empty directory the digest consists of 16 zero bytes. If the directory is not empty, its digest is the sum of the digests of the member files. The sum of digests is done as if they were arrays of 8 shorts and without carry between shorts.

Your program must use at least one protected bounded buffer and a pool of threads, with at least, say 5 threads in the pool. Your program should be a well designed concurrent program with a clean termination when all files have been processed.
Your program will open the digests.txt file, create the protected bounded buffer [make it of size 10 so it gets full often], put in it the name of the initial file (i.e. something like /usr), start the threads. Then it will wait for the threads to terminate and clean up.
Your threads should have the following behavior:

	Let filename be a string pointer
	while ((filename = get item from protected bounded buffer) != NULL) {
	    Let current be the file descriptor obtained by opening the file filename
	    if (current names an ordinary file or a directory) {
		if (current names an ordinary file) {
		    compute its digest and write filename digest to the digest.txt file
		    add digest of this file to digest of containing directory
		    if (this was the last child of that directory)
			Write directory name and its digest to the digest.txt file
		} else if (current names a directory) {
		    Let s be a digest consisting of zeroes.
		    for each (child x of the directory) {
			put x in the protected bounded buffer
		    }
	    }
	}

Note that the protected bounded buffer must be designed with the following in mind:

In your code documentation and in the README file discuss your design and the reasons for your decisions.

Submit your homework as a tar file to the dropbox on Blackboard. Be sure to verify that the homework is there. Or be sure to receive within 24 hours a message back from the TA/instructor confirming that he has received your mail with the homework. If you do not receive that reply contact both the TA/instructor.