Compiling and Running programs

=============================================================================
Overview:  This lesson gives you a quick overview on how to compile and run
           C programs.  It addresses things like file names, interrupting
           programs, and using redirection.
=============================================================================

Section                            Topics
-------                            ------

   Compiling
   Interrupting a running program
   Redirection
   Include files and libraries
   Interpreted languages




Compiling
---------

     A computer cannot understand a program written in any of the standard
high level programming languages, like C, FORTRAN, Ada, LISP or C++.  All
programs must be translated into machine code.  A compiler is a program that
does this translation.
 
     Each language has to have its own compiler, because the compiler has to
have detailed knowledge of the structures of each language.  A C compiler 
would be unable to understand FORTRAN's rules; likewise you cannot use a C++
compiler to translate Pascal.  However, all compilers must produce machine
language programs for the same computer, namely the SUN.

     Compilers are generally very complicated programs, but using them is
usually fairly straightforward.  The compiler takes as input a program, called
the source program, written in the high level language.  After analyzing it
for its meaning, the compiler writes a new file which is in SUN machine
language.  This file is usually called a.out, which actually stands for
"assembler output".  (The assembler is an intermediate step in the compiling
process.)

     Here's an example.  Suppose that you create a very simple C program
and put it into a file called demo.c:

     % cat > demo.c
     #include <stdio.h>
     void main()
     {
          char name[100];
          printf ("What is your name? ");
          gets (name);
          printf ("Well, hello there, %s\n", name);
     }
   CONTROL-D to end the file

Now you compile it:

     % cc demo.c

If there are no errors (and there aren't above), UNIX will create the file
a.out.  This is a machine language program for the SUN.  It will not run
on your Macintosh or Gateway 2000 since these computers use a different
chip and hence a different machine language.

     To run the program, just type in its name.  Below is shown the interaction
you would see:

     % a.out
     What is your name?  Binky
     Well, hello there, Binky
     %

You can probably figure out that this C program prints out a question,
What is your name? and then waits for your answer by using gets.  Then it
stores the response in a variable called "name" and then prints it back out.


Interrupting a running program
------------------------------

     Programmers seldom write programs that run right the first time.  Errors
that occur during run-time are called "bugs," as opposed to errors that show up
during the compilation phase, which are syntax errors.  Bugs never go away with
increasing programming experience -- they merely get more subtle.

     A common bug is for the program to run forever and not relinquish control
back to the user, something affectionately known as an infinite loop.  In UNIX,
you can kill a program that is in an infinite loop by pressing CONTROL-C.
Actually, you can use CONTROL-C to stop any program or command, especially if
it is taking too long or if you change you mind about using it after you start
it.  Beware that you might have to clean up any temporary files that the program
created but was unable to delete upon completion.

     A few commands may not be in infinite loops but are instead waiting for 
input.  If that is the case, press CONTROL-D before killing it.

     Some programs and commands have a way of trapping attempts to kill them
so typing CONTROL-C will not kill them.  There are several other things you can
try.  First, try CONTROL-\.  If that doesn't work, press CONTROL-Z which does
not kill jobs but suspends them.  If you see the following

     stopped
     %

then you know that you were successful in suspending the job.  Notice the shell
prompt.  Type in the following:

     % kill %1

Here, the %1 means "job #1".  Hopefully this will kill the program.

     If the program you are trying to wrest control from still does not respond
to anything, there is a way to kill it using the commands "ps" and "kill".
You must find out the PID of the program, the "Process IDentification number".
Here's a sample output of "ps:"

     % ps
       PID TT STAT  TIME COMMAND
     15201 p1 R     0:00 ps
     29277 p1 S     0:02 -sh (csh)
     14547 p3 S     0:07 a.out
     29311 p3 IW    0:05 -sh (csh)

Suppose that you wanted to kill the a.out process.  Its pid is 14547, so to
kill it, do

     % kill -9 14547

But you must do these steps from another terminal or window, since the errant
program has locked up your own window.  Nothing can refuse the kill -9 command
(-9 specifies that you are using the unstoppable kill signal), but if things
STILL do not seem right, contact a faculty person.


Redirection
-----------

     UNIX has an extremely flexible system of changing the destination of output
and the source of input.  This is called redirection or I/O redirection.

     The C program in the previous section read from the keyboard and wrote to
the screen.  Every UNIX process has three standard files that it defines:

     stdin  --  standard input, usually the keyboard
     stdout --  standard output, usually the terminal screen
     stderr --  standard error output, usually the terminal screen also

However, any of these can be redirected.  If you want to send the output to
a file, rather than to stdout, you can use the > symbol to redirect it:

     % a.out > my.output

Of course, this is rather bad for the above program since it is interactive,
that is, it engages you in a dialog.  If you can't see the questions, because
they are being stashed in the file "my.output", how can you answer them?  
However, other programs, especially filters, just produce output, so it is
very appropriate to redirect their output.

     As an example of appropriate use of redirection, consider the following
C program that does simple statistics on a list of real numbers.  It prints 
the sum, average, minimum and maximum values of the input numbers.  The file
name is stat.c:

     #include <stdio.h>
     #include <stdlib.h>
     void main()
     {
          float val, min = 1.0e10, max = -1.0e10, sum = 0.0, average;
          int num = 0;
          char line[80];
          while (gets(line) != NULL) {
               val = atof(line);
               if (val > max) max = val;
               if (val < min) min = val;
               num++;
               sum += val;
          }
          average = sum / num;
          printf ("Sum     = %15.8f\n", sum);
          printf ("Count   = %15.8f\n", num);
          printf ("Average = %15.8f\n", average);
          printf ("Minimum = %15.8f\n", min);
          printf ("Maximum = %15.8f\n", max);
     }

This program reads a series of numbers from stdin by using gets(line).  There 
is no interaction with the user.  Once the end of the data stream occurs,
that is when gets(line) returns the value NULL, the program calculates the
average and prints out the resulting statistics.

     This program can be used interactively from the terminal, with the user
typing in the dataset.  To signal end of file on a terminal, the user types
CONTROL-D.  Here's an example:

     % cc stat.c
     % a.out
     6.7
     2.48
     1.5
   (user presses CONTROL-D)
     Sum     =     10.68000031
     Count   =      0.00000000
     Average =      3.56000018
     Minimum =      1.50000000
     Maximum =      6.69999981
     %

     To redirect input, use <.  For example:

     % a.out < some.input

where "some.input" contains the dataset.  In the case of the stat program above,
it would be a list of real numbers, like the ones that were typed in from the 
terminal.  If only input is redirected, the user will see the output on her 
terminal screen.

     Stdin and stdout can be redirected independently of each other, or both 
at the same time.  Here's an example:

     % a.out < some.input > some.output

It doesn't matter if "> some.output" appears before or after "< some.input".

     We will not discuss herein how to redirect stderr.  This is considered
poor practice, anyway, since we often want to see the errors rather than have
them disappear into a file.

     The power of redirection is that the input data can come from the user
sitting at the terminal, or from a prepared file.  Likewise, output can be
displayed at the terminal, or it can be saved in a file for later use, perhaps
in a report or a help file or sent to the printer.  The decision of when and
where to send the output or get the input can be easily changed by the user
without changing the source program.

     If the C program, however, opens a specific file and reads from it, or
writes to a specific file, then redirection will have no effect.  Only the
three files "stdin", "stdout", and "stderr" can be redirected.


Include files and libraries
---------------------------

     Since UNIX and C go hand in hand, much attention is usually devoted to
C programs, though there is no rational reason other than historical precedent
for this being so.  But to honor history, a little more attention will be
given here to C.

     The C header files, whose names always end in ".h", contain definitions
that set up types, constants, external function definitions and other 
information.  These files appear in C source programs specified by the #include
line:

     #include <stdio.h>

The brackets mean something special to the C preprocessor, which is the UNIX
program that handles part of the compiling process for C programs.  It inter-
prets the #include and #define statements, among a few others.  

     The C preprocessor looks at the file name given on the include line, and 
ascertains whether it has brackets or double quotes around it.  If brackets, 
then the file is located in a well-known directory, usually

     /usr/include

If there are double quotes, then the file is in the current working directory,
or the file name is given in full such as

     #include "/usr1/local/doc/C/HEADERS/junk.h"

     All computer programming languages come with a set of subprograms already
written and debugged to perform the usual tasks, like input and output.  These
subprograms are stored as object code, not source code, so as to avoid the
cost of recompiling.  Instead, they are just copied into your new object
program, stored in file a.out, and special links are made to tie them into
your code.  This whole process is called "linking" or "link editing".

     Examples of subroutines that are provided in these standard libraries
are gets() shown in the program above, as well as printf() and atof().  (atof
stands for "ASCII string to floating point number," which converts strings to
internal real numeric form.)

     The subroutines are stored in what are called archive libraries, which are
files whose names ends in .a.  These have a table of contents indexing the 
subroutines in the file so they can be located quickly by the linker program, 
which does the piecing together of the object code segments into the final 
a.out.

     In most UNIX systems, the libraries are stored in the directory

     /usr/lib

They use a particular naming convention that shortens options on the cc command.
All the library files have a name of the form

     /usr/lib/libX.a

where the X is the important part.  For example, the standard C library, which 
contains gets() and printf(), is called /usr/lib/libC.a.  

     If you need to use various math functions, like sqrt(), cos(), and exp(),
you will find them in the library /usr/lib/libm.a, which is the math library.
The linker needs to know where to get the code segments out of a particular 
library, so you would tell it to access /usr/lib/libm.a by using the option -lm
on the cc command:

     cc myprog.c -lm

The "m" corresponds to the X portion out of the library's name, libm.a.

     This is usually only one of two things you must do, the other being to
include the file /usr/include/math.h into your source program.  To do this,
put the following into your file near the top:

     #include <math.h>

Since the math.h file is in the standard place, you need only put brackets
around it.

     The reason why you have to do both things is that you must inform the
compiler of the functions that you will be using by means of extern function
definitions, and these are found in the header file.  For example, in the
file math.h, you will find

     extern double sqrt(/* double */);

which is needed in order to tell the C compiler that sqrt is a function that
returns a double precision floating point number.

     But the actual object code that does the square root operation is in the
archive library /usr/lib/libm.a, and the linker, which is a separate program,
needs to know this.  Thus, you communicate the name of the archive library
file to the linker by using the shorthand code -lm on the cc statement.

     The above discussion hints at the fact that the cc command is quite
complex.  In fact, it is what is called a "wrapper" because it wraps together
several different stages in the translation process.  Here they are:

     cpp   --- C preprocessor, translates the #include and #define lines
     ccom  --- the actual C compiler, generates an object code file
     ld    --- linker, combines the object code with library routines to
               produce the final executable file a.out

Other compilers are also wrappers, like "pc" which is the Pascal compiler
and linker.  However, some languages have explicit steps which you must do,
such as Ada.


Interpreted languages
---------------------

     Some programming languages are not translated in the same way as C.  This
causes these languages to be used in a slightly different way.  To
make the distinction clearer, let's review how C does it.

     You prepare a C source program in a file, and then compile it.  This yields
the executable file a.out.  You can run a.out as many times as you like, never
needing to recompile.

     An interpreted language, like LISP, works in the following way.  You may
prepare your source program in a file, although you don't have to.  Then you
start up the LISP system and ask it to read the source program file.  It does,
and stores the program in memory and lets you run it.  However, you can modify
the source program while it is in memory and rerun it.  All the stages are
combined in such languages.

     There are sound reasons for choosing one style over the other, and these
are addressed in CSC 251.  However, suffice it to say that interpreted languages
are often much more complex and flexible, and permit tinkering and changing
while you run the program.  This is one of their strengths, because you don't
have to wait to recompile the whole program, which may take hours if the
program is very large.

     As you might imagine, if the translation process is combined with the
running process, then every time you want to run a LISP program, the system
has to actually retranslate parts of it, slowing everything down.  This, then,
is a weakness of interpreted languages.

     Most interpreted languages, such as LISP, use a subsystem, with its own
prompt and its own set of commands to edit a program, run it, and so forth.
When you start up the LISP interpreter at Canisius, you will start to interact
with such a subsystem.  Its prompt is the greater than sign:

     % ibcl
     >

(It actually shows you a bunch of copyright warning messages first.)

     There is often a need to temporarily escape to UNIX while in one of these 
subsystems in order to do a file command, check mail, or any number of tasks.  
Here's how you could do a listing command while in ibcl:

     % ibcl
     > (system "ls")

The UNIX command is surrounded by double quotes.

     However, you can go one step further and create a nested shell:

     % ibcl
     > (system "csh")
     %

This actually puts the ibcl subsystem to sleep while it gives you a new C
shell (with the % prompt).  You can use this to do any number of tasks,
even running ibcl again!  When you are done, get rid of this shell by
either pressing CONTROL-D or typing the command

     % exit