CIS 1068: Homework 10

Handed out: 3/30/10;
Due: by 10pm on 04/05/10

This assignment is similar to homework 8. Again we download a file from the internet. If I execute
wget http://gutenberg.org/dirs/7/76/76.txt
I will download the file 76.txt which contains "The adventures of Huckleberry Finn".
Again we use the command
java Homework10s10 < 76.txt
to read as standard input the file.

I want to read all the tokens from 76.txt and store them in lowercase, in sorted order into an arraylist. Remember that we can get the tokens with
Scanner scan = new Scanner(System.in);
scan.useDelimiter("\\W+");
and then use the scanner as usual with hasNext() and next().

As you can imagine, the same tokens will appear multiple times, so associate to each token the number of times it has occurred.

Finally print out the 100 most frequent tokens in decreasing frequency.

For example here are my 10 most frequent items.
6427  and
4975  the
3667  i
3214  a
3020  to
2559  it
2123  t
2070  was
1863  he
1764  of
Your program will take a few seconds to run.

Email your program (just the .java file, not the .class file) to the Teaching Assistant.