CIS 1068: Homework 8

Handed out: 3/16/10;
Due: by 10pm on 3/22/10

We have seen that a command like
java Foo < moo.txt
will submit the content of moo.txt as standard input to Foo in place of reading from the keyboard.
We will use as moo.txt a file that you can download from the internet. For example if I execute
wget http://gutenberg.org/dirs/7/76/76.txt
I will download the file 76.txt which contains "The adventures of Huckleberry Finn"

I want to read the moo.txt (or 76.txt) file and count all its tokens [remember - a token is a maximal sequence of non white characters - but we prefer in this homework to define a token as a maximal sequence of letters. You can do that by saying
Scanner scan = new Scanner(System.in);
scan.useDelimiter("\\W+");
and then use the scanner as usual with hasNext() and next(). ]. Namely, we want to count the tokens of length 1, of length 2, .., of length 16, of length greater than 16. You will print out lines specifying the token lengths and corresponding counts. Then you will print out the count and length of the most frequent length. Finally print out a histogram indicating on each row the length of the token, the number of tokens of that length, and then a sequence of '*' proportional to the number of this token. For example I got a histogram like this

 1 11087 ************************
 2 21733 ************************************************
 3 31961 ************************************************************************
 4 24591 *******************************************************
 5 12433 ****************************
 6  8175 ******************
 7  4967 ***********
 8  2190 ****
 9  1414 ***
10   683 *
11   285
12   164
13    35
14    20
15     3
16     1
17     3
Be sure to document your code, saying who you are, class, lab purpose, and some information describing each method.

Develop your program on Unix or on Windows using the command window.

Email your program (just the .java file, not the .class file) to the Teaching Assistant.