CIS 1057. Computer Programming in C

Chapter 8. Strings

 

1. String basics

In computer programs, a string is a sequence of character, and its length can be any non-negative integer.

In C, a constant string can be represented directly as a sequence of character between a pair of double quotation marks (" "), which can be declared by the #define directive. For example,

#define ERR_PREFIX  "*****Error - "
#define INSUFF_DATA "Insufficient Data"
A string is implemented in C as an array of char, with a null character '\0' at the end. Characters after null are not considered as part of the string. For example, the declaration
char str[20] = "Initial value";
will have the following memory content associated to variable name str:
[0]         [4]           [9]         [13]              [19]
 I  n  i  t  i  a  l    v  a  l  u  e  \0
Consequently, the size of an array is at least one more than the length of the string in it.

An array of strings is a two-dimensional array of characters. For example,

char month[12][10] ={"January", "February", "March", April", "May", "June",
                     "July", "August", "September", "October", "November", "December"};
When using printf and scanf, the first parameter is always a string, and may contain placeholders (starting with a '%'). A placeholder in printf for another string is written as '%s'. Adding an integer between the two characters specifies the minimum field width of the string, as for other types of placeholders.

Placing a minus sign prefix on a placeholder's field width causes left justification as in FIGURE 8.1:

Similarly, scanf can read a string by using '%s'. Since an array parameter is passed by reference, the '&' operator should not be used. The function skips leading whitespace characters, starts at the first non-whitespace character, copies it into the character array, and continue copying the following characters, until a whitespace character is encountered. Finally, a null character is put at the end of the array.

An example of string input/output is in FIGURE 8.2. The content of corresponding memory is shown in FIGURE 8.3:

 

2. String library functions: Assignment and Substrings

For strings, the assignment operator '=' can only be used in initialization, such as
char con_str[20] = "Testing string";  /* works fine */
but cannot be used later to change the value, such as
char one_str[20];
one_str = "Testing string";           /* does not work */

Instead of using the assignment operator, string manipulation in C is carried out by various library functions defined in string.h. Among the functions, strcpy copies one string into another, and strncpy copies the first n characters from one string into another. Both functions are restricted by the size of the destination string — copies can be done only when there is space in the destination. Also, the ending null character may be lost. For example, see FIGURE 8.5 for the result of strncpy(result, s1, 9):

In certain situations, it is possible to add the '\0' in with an assignment like

dest[dest_len - 1] = '\0';

To call the functions with a subscript will lead to an operation at a substring, because a string name indicates nothing but the starting address of the array to be processed. FIGURE 8.6 shows the result of strncpy(result, &s1[5], 2):

The program in FIGURE 8.7 breaks compounds into their elemental components, assuming that each element name begins with a capital letter. The library function strlen returns the number of characters in a string, without counting the null character.

 

3. Longer Strings: Concatenation and Whole-Line Input

Library functions strcat and strncat modify their first argument by adding all or part of their second argument at the end of the first. This operation is called "concatenation", and it assumes that in the destination there is enough space to hold the added characters. Please note that the null character at the end of the destination is removed before new characters are added.

The functions defined on strings cannot take a char as argument. If you wish to add a single character at the end of a string, you should view the string as an array, and use assignment to subscripted elements for access. Be sure to include the null character at the end of the string.

For input of a complete line of data (i.e., ended by '\n'), the stdio.h library provides functions gets and fgets, as the replacements of scanf and fscanf, respectively.

The function gets reads the input characters (including whitespaces) into a string, until a '\n' is encountered, which is replaced by '\0' in the string. If the string is not large enough, overflow happens.

The function fgets reads from a file, and it has an argument to indicate the maximum characters to be read. If '\n' is encountered before the maximum, it is stored into the string, before the additional '\0'. Otherwise the line is cut at the maximum position, and the additional '\0' is added into the string. When the end of a file is reached, a '\0' is added at the end of the string. For example, the program in FIGURE 8.8 turns an input file into an output file, which is double spaces, and with numbered lines.

 

4. String comparison

Though the comparison operators "<", ">", and "==" can be used between characters, they cannot be directly used between strings to compare their contents (though they can be used to compare the addresses of strings).

The standard string library provides the int function strcmp for comparison of two strings. The function distinguishes three types of situations:

FIGURE 8.9 shows how to rewrite the selection sort program to sort strings.

FIGURE 8.10 shows how to use a sentinel controlled loop for string input.

 

5. Arrays of pointers

The program section in FIGURE 8.11 exchange two strings in an array

as shown in FIGURE 8.12

Since each element of list is a reference to an array of characters, it is passed to a function as a pointer — that is, as the address of the array's element at index 0. To improve the time efficiency of the program, we can introduce an array of pointers, and just exchange the pointers to the strings to sort the array, without actually exchange the array elements themselves, as shown in FIGURE 8.13:

Another benefit of this approach is that the original order of the strings in the array is still available. A complete program is in FIGURE 8.14.

 

6. Character operations

To work on a string, it is often necessary to work on individual characters in it.

The stdio.h library function getchar read a single character from the standard input (keyboard). The following statements put the same value in ch:

  scanf("%c", &ch);  // (1)
  ch = getchar();    // (2)
A difference between the two is that (1) returns the number of correct input (1 in the above example), while (2), as an assignment, returns the int code of the character.

Similarly, the stdio.h library function call getc(inp) returns a character from file inp.

As an example, the function in FIGURE 8.15 gets one line from standard input, and put it into a string.

Corresponding to the input functions getchar and getc, there are output functions putchar and putc:

  putchar('a');     // write 'a' to standard output
  putc('a', outp);  // write 'a' to file outp
The C library ctype.h includes functions for character classification and conversion. Using them, a function in FIGURE 8.16 converts the lowercase letters in a string into uppercase, and another one compares strings while ignoring the case of the letters.

 

7. String/number conversions

The stdio functions scanf and printf can convert strings to and from numbers, respectively. Functions fscanf and fprintf do similar things with files.

Functions sscanf and sprintf carry out the same processes with a string as the first argument, that is, sscanf decomposes a string into its components, while sprintf composes a string from its components.

The program segment in FIGURE 8.17 is an example of input validation and error handling.

The program in FIGURE 8.18 converts among different representations of date.

 

8. String processing illustrated

The problem is to implement a simple text editor. The structure chart is in FIGURE 8.19:

The program is in FIGURE 8.20. A sample run is shown in FIGURE 8.21:

 

9. Common errors

Be careful about memory allocation and management. For example, the function in FIGURE 8.22 uses a locally defined string as return value, but what will be returned will be the address, not the content, of the string, while its memory will be released at the end of the function, so it is a bug. The correct solution is to use an output parameter, as in FIGURE 8.15.

Don't forget the '\0' at the end of a string.