CIS 1057. Computer Programming in C

Chapter 2. Overview of C

1. C Lanaguage Elements

A simple C program, like the one in this sample program, has two parts: preprocessor directives and the main function.

The text between a "/*" and the next "*/" is a comment, which will be read by programmers, but ignored by compilers.

Preprocessor directives tell the preprocessor how to modify the program before compiling it. Each directive is marked with the symbol "#".

In this example, the first line gives the program access to a library, that is, a ready-made program that carries out a common-used task. In this case, it is the programs that carry out standard input/output operations. Their definitions will be inserted into the program before compilation.

The second line in the preprocessor directives defines a constant macro, as the name of a number. The preprocessor will replace the name by the number in the program before compilation. Only data values that never change during execution should be defined in this way, and it helps understanding and maintenance.

The main function is the entry point of execution. Every C program has one and only one main function. The first two lines form the header of the function, while the lines in the following "{ }" form the body of the function.

The header of a function specifies the functions name, input, and output. The body of a function specifies what to do when the function is called (i.e., used).

A function body has two parts: declarations and executable statements. The former tells the compiler the program's need for memory, the latter is translated into machine language and later executed.

Every programming language has a fixed set of reserved words, which have predetermined meaning. A list of such words for C is listed here.

User-defined identifiers (names) are restricted by the following rules:

An identifier must consist only of letters, digits, and underscores.
An identifier cannot begin with a digit.
A C reserved word cannot be used as an identifier.
An identifier defined in a C standard library should not be redefined.

It is better to give identifiers meaningful names to make the program readable to human users.

The C compilers distinguish uppercase and lowercase letters.

2. Variable Declarations and Data Types

The memory used for data are called variables, because the values stored there can be changed as the program runs.

Variable declarations inform the compiler the variable names to be used in the program, as well as their types, which specifies the amounts of memory needed.

A variable declaration is a type identifier followed by a list of user-defined variable identifiers.

Common data types in C include int, double, and char.

C has int and double for numerical values. Though integers can also be put in double variables, int provides better space and time efficiency, because it is stored in a shorter binary code. Also, int values are stored and processed precisely, while double values are often handled approximately.

Concretly speaking, an integer is represented directly by a binary number (such as 110 for 6), while a floating-point number is represented by a pair [mantissa, exponent] for mantissa × 2^exponent (such as [0.75, 3] for 6.0).

The range of numbers vary from implementation to implementation. For instance, the following is the result in a Unix gcc implementation:

Range of positive values of type int: 1 . . 2147483647
Range of positive values of type double: 2.225074e-308 . . 1.797693e+308

Here the double values are represented in scientific notation, so the range is [2.225074 × 10^-308, 1.797693 × 10³⁰⁸].

ANSI C provides several integer data types, with their ranges in typical microprocessor implementations listed in Table 7.1 of the textbook:

TYPE             RANGE
short            -32767 .. 32767
unsigned short   0 .. 65536
int              -2147483647 .. 2147483647
unsigned         0 .. 4294967295
long             -2147483647 .. 2147483647
unsigned long    0 .. 4294967295

Similarly, ANSI C defines three floating-point types that differ in their memory requirements. The following table list their approximate ranges and significant digits in typical microprocessor-based C implementations:

TYPE             DIGITS       RANGE
float            6            10^-37 .. 10³⁸
double           15           10^-307 .. 10³⁰⁸
long double      19           10^-4931 .. 10⁴⁹³²

A char is a single character, whose constant is represented by a pair of single quotation marks. It can be compared using the equality and relational operators.

Within memory, each character is represented by a corresponding numeric code. Three common character codes are shown in Appendix A of the textbook. The most often used code is ASCII.

3. Executable Statements

Executable statements in a program carry out specified operations (each by one or more instructions in a machine language). It ends with a semicolon ';'.

An assignment statement in C is represented by the equal sign '='. On the left-hand side of the sign is a variable name, and on the right-hand side is the new value to be put into the variable. The sign '=' in this context means "gets" and "becomes". For example, you can write

    sum = sum + 1;

where the value stored in sum is increased by 1.

An input operation transfers some data from the outside into memory. An output operation transfers some data from memory to the outside. The most common input/output operations in C are carried out by functions defined in library stdio, and can be called with various arguments.

The sample program uses printf to display texts and numbers, in given formats.

The sign "\n" is the newline escape sequence.

A place holder begins with symbol '%', followed by one or two letter, indicating the type of the expected arguments. For instance,

        Placeholder    Variable Type    Function Use
        %c             char             printf/scanf
        %d             int              printf/scanf
        %f             double           printf
        %lf            double           scanf

An input/output format can have more than one place holders (for more than one variable).

In the sample program, the function call

    scanf("%lf", &miles);

puts the next input double value (from keyboard) into variable miles. Please note the ampersand sign '&' before the variable name. This operator tells the function where to find the location in memory to put the new input value. After scanf is called, the program waits the user to type something and press the "Return" (i.e., "Enter") key.

In the sample program, the last line in the main function is

    return(0);

which reports that the program ends as expected.

4. A program

In general, a C program has the following form:

    preprocessor directives
    main function heading
    {
        declarations
        executable statements
    }

The C compiler treats line breaks as spaces, so a statement can extend into multiple lines.

Proper program style, such as the use of spaces and comments, improves the readability of a program.

5. Expressions

Arithmetic expressions in C are very similar to their formats in mathematics.

Operators '+', '-', '*', and '/' can be used between two int values or two double values to produce a result of the same type. In integer division, the fraction part of the result is lost. If one of these operator is used between an int value and a double value, the result is double.

The remainder operator '%' returns the integer remainder of the result of dividing its first operand by its second. Both operands should be integers.

An assignment of a double value into an int variable will cause the fractional part to be lost. An assignment of an int value into a double variable will not change the value, but its type changes.

The type of an expression can be changed by "casting", that is, by placing the desired type in parentheses before the expression.

When there are multiple operators in an expression, the order convention is the same as in mathematics: it includes the parentheses rule, the precedence rule, and the associativity rule.

CASE STUDY: Evaluating a Collection of Coins

Problem: To determine the value of a collection of coins.

Analysis: The input is the count of each type of coin. For each type, multiple the count and the unit value. Then add all the products together to get the total amount. Finally, display the total.

Data: input includes a customer's initials (3 char values) and counts (4 int values); output includes the dollar amount and change (2 int values); program variables include a total amount in cents.

Design:

Get and display customer's initials.
Get the count of each kind of coin.
Compute the total value in cents.
Find the value in dollar and change.
Display the value in dollar and change.

These steps can be further refined to get a detailed algorithm.

Implementation: FIGURE 2.13.

Testing: try different combinations, including general cases and special cases.

6. Formatting

All numbers are displayed by printf in their default format, unless the program specifies otherwise.

For variables of type int, an integer can be added between the '%' and the 'd' can specify the field width, i.e., the number of columns, in which the value is displayed as right-justified, with spaces on the left, if necessary. If the field is not wide enough, the remaining digits will be put to the right. In this sense, the default field width is 1.

The following table shows some examples.

Value    Format    Display    Value    Format    Display
234      %4d        234       -234     %4d       -234
234      %5d         234      -234     %5d        -234
234      %6d          234     -234     %6d         -234
234      %1d       234        -234     %2d       -234

For variables of type double, both the field width and the number of decimal places desired (by rounding the rest digits) can be specified, and they are between '%' and 'f', separated by a '.'. If only the first one is specified, the '.' can be omitted.

Value    Format    Display    Value    Format    Display
3.14159  %5.2f      3.14      3.14159  %4.2f     3.14
3.14159  %3.2f     3.14       3.14159  %5.1f       3.1
3.14159  %5.3f     3.142      3.14159  %8.5f      3.14159
.1234    %4.2f     0.12       -.006    %4.2f     -0.01
-.006    %8.3f       -0.006   -.006    %8.5f     -0.00600
-.006    %.3f      -0.006     -3.14159 %.4f      -3.1416

7. Modes and files

There are two basic modes of computer operation: batch mode and interactive mode.

If a program is designed to run in interactive mode, it exchanges data with the user when it runs, usually through the keyboard/mouse and the monitor, as in the case of sample program in Figure 1.13.

Similar function can be achieved by another program that works in batch mode. In UNIX/Linux, a C source file filename.c can be compiled as "gcc filename.c", then run the executable object file as "a.out". Such a program gets its input data from a data file prepared in advance, and puts its output data into another file, as the program in FIGURE 2.14. It is different from the previous one, in that it does not prompt the user for input. Instead, a confirmation is provided for each input.

The operating system provides input redirection, so scanf gets its data from a file, not from the keyboard (which is the standard input). In UNIX and MS-DOS, redirect input to file mydata can be done with "<mydata" in the command line, that is, to change the execution command from "a.out" to "a.out <mydata". Consequently, the scanf in the program gets the first line in the data file. After that, the following printf statement echo prints the value the program just got, so that the user can follow the progress of the program.

Output redirection is similar to input redirection, that is, by changing the execution command from "a.out" to "a.out >myoutput". The two redirections can be combined into "a.out <mydata >myoutput".

8. Errors

In the development process of a program, various types of error may be made. The process of removing the errors, or bugs, is usually called "debugging".

A syntax error occurs when the source code violates certain grammar rules of C, and is detected by the compiler during the translation process from source code to object code. Such an error will cause an error message to be reported by the compiler. For example, FIGURE 2.15.

When an error message is sent out by a compiler, the position where an error is reported is where the compiler recognized the error, but not necessarily where the related error is made.

Not all errors can be found by a compiler. A run-time error occurs during the execution of a program, when the program attempt to perform an illegal operation, as the case of FIGURE 2.16.

Some errors do not trigger any error message, but lead to incorrect results. For example, FIGURE 2.17 is a program that produces a wrong result because an error in input processing — the "\n" of the first scanf is mistakenly caught by the second scanf. This is the case, since scanf first skips any blanks and carriage returns in the input when a numeric value is scanned. In contrast, it skips nothing when it scans a character unless the %c placeholder is preceded by a blank. This problem can be removed by adding a space into the second scanf, to turn it into "scanf(" %c%c%c", &first, &middle, &last)", so as to skip spaces.

FIGURE 2.18 is another program that has a bug, but produces no error message.

Finally, a logic error is one cased by an incorrect algorithm. Such an error can only be recognized by comparing the actual output with the desired output.