CIS71 LAB10R -- Counting Word Occurrences -- Arrays, Strings, and I/O
Assigned Oct 2, 2005. Due Oct 9, 2005.
Objectives.Review arrays. Introduce string processing. Introduce code recycling. Practice top-down problem solving.
Reading: Hanly--Koffman Chapter 9
Problem Write a program that will read a sequence of tokens
(English words and punctuation marks) each separated from the next by
a blank. For example:
the quick , brown fox jumped over the tall white fence . . . and fell !
The input will contain no capital letters and will end with an
end-of-file mark (eof). Valid input will not contain any tokens longer than 15 characters. Valid input will not contain more than 200 tokens.
Your program is to produce the following information about the input
sequence of tokens:
- The number of punctuation marks
- The number of words of length 1, the number of words of length 2, ..., the number of words of length 15. Note: Any sequence of alphabetic characters will be considered a word, even if it doesn't appear in Webster's. (See help.)
- The list of words in sorted order (See help.)
- The number of occurrences of each distinct word. (See help.)
What to Look For: There are two important factors in solving
this problem. One, you need to understand and document the sequence
of steps the program is to go through to produce the desired results.
At the same time, you should carefully identify all the data stores
you need (in a data table) and you should begin making a list of the
separate functions you need to complete this task. You should work on
this problem in an incremental fashion.
Step 1: Program Decomposition and stepwise refinement. Start with a brief list of the (approximately four) tasks your program has to perform. (You can have one of the TAs check this if you want.) For each task, make a short list of subtasks necessary in order to perform the task. (You can have one of the TAs check this if you want.) Some of the subtasks are complicated. For each of them, make a list of the steps needed in order to perform the subtask. See the "Help" section at the end of this document.You must have one of the TAs check this before proceeding to the next step.
Step 2: Data Table. For each task, write a list of inputs, outputs, and auxiliary data that it requires.
Step 3: Function Headers. Write function headers for the tasks and subtasks you identified in Step 1. Each header should include parameter names and types, a return type, and comments describing the purpose of the function, its inputs, its outputs, and what it returns. Have one of the TAs check this before proceeding to the next step.
Step 4: Copy and Read the program strings.c. The comments explain most of what's going on, but you should ask one of the TAs about anything you don't understand. Will it be on the final? Probably.
Step 5: Run and Debug your copy of strings.c. Make sure that it works correctly even if the user types strings that are too long. Have one of the TAs check this before proceeding to the next step.
Step 6: Finish your program. Write bodies for your functions. Write a short main program. Test it.
Help
- The table of string library functions on pages 489-490 of your
textbook is very useful. Don't try to rewrite them yourself.
- When you count the number of strings of each length, you should keep a table holding the 15 counts. An array of length 16 is convenient for this purpose. Will I get credit for doing it some other way? Probably not.
- Sorting a list of strings is a lot like sorting a list of numbers, except that you don't use <, <=, >, >=, and == to compare strings. You use strcmp() instead. Should I modify and re-use the sorting function that I handed in two weeks ago? Yes. Code recycling is one of the objectives of this week's lab.
- Once you have sorted the list of strings, identical strings will appear as blocks of consecutive strings in this list. The number of occurrences of a word is just the length of its block. Compute block lengths by counting. Will I get credit for doing it some other way? Probably not.
- Your program is expected to work reasonably even if the input is invalid. Overruning an array doesn't only produce wrong answers -- it's a security risk. Your program is also stored in memory. If a user overwrites your program, he might make it do anything.