Directories              

================================================================================
Overview:  It is difficult to talk about files in UNIX without talking about
           the "homes" where those files live, namely directories.  Since many
           users are familiar with IBM PCs and Macintoshes, which have similar
           ways of clustering files, we shall spend less time explaining the
           directory concept than describing what commands UNIX allows us to do.
================================================================================

Section                            Topics
-------                            ------

   How files are organized into directories
   Pathnames
   Moving around in the file tree
   Making and destroying directories
   More about ls
   Moving files around
   Copying directories
   Linking directories


How files are organized into directories
----------------------------------------

     Files are storage entities which contain data.  Files with related purposes
are clustered together and stored in directories.  A UNIX directory is just like
a folder in the Macintosh, or a directory in MS-DOS (the operating system of
IBM PCs and their clones.)

     In reality, a UNIX directory is itself a file.  It contains a list of names
of other files and directories and pointers to the actual files' location on
disk.  UNIX manages all the disk space for you, and it stores all files on
disk in ways that you needn't even think about.  These pointers are actually
addresses of places on the disk where the data is stored.

     Most of the UNIX commands center around the file system.  Files in UNIX 
are kept in the directory tree.  The top level of this tree is called the root 
directory and is denoted by a single slash.  To see what files are at the root 
level, use the list command, ls:

        ls /

     All other directories and files are in the root directory.  Here's a 
picture of a typical directory tree:

                                    /
                                    |
             +----------+-----------+-----------+----------+
             |          |           |           |          |
            bin        usr         etc         tmp        dev
             |          |
        +----+----+     +-----+--------+
        |    |    |     |     |        |
       cat  ls   rm    joe   tammy    alice
                                       |
                              +--------+--------+
                              |        |        |
                           assigns   homewrk   notes
                              |
                         +----+----+
                         |    |    |
                         1    2   2.old

In this example (much simplified and reduced in size), there are five director-
ies in the root directory:  bin, usr, etc, tmp, dev.  These are standard names
for common directories, not just fictitious names.  "bin" holds many of the
executable commands like cat, ls and rm.  "usr" often holds the directories
that belong to the various users of the system.  In this example, there are
only three users: joe, tammy and alice, and each has his/her own directory
Alice has two files in her directory and a subdirectory.  Each directory is 
actually the root of a smaller file tree.

     In summary, a directory "holds" or contains files and other directories.
There is one main directory for each UNIX system, called the root directory,
and its name is "/" (slash).  All files in a UNIX system are therefore "inside"
this root directory.
  
     We often conceptualize the file and directory relationships as an upside-
down tree.  (It should be called a root system, but calling it a "tree" is a 
historical convention.)  Each directory has a mini-tree inside it, and this 
system can become arbitrarily complicated.


Pathnames
---------   

     Each file or directory in a UNIX system has a name and also a position
within the file tree.  Its name is just a string of characters (except for
the slash).  The position is notated by writing down all the directories in
which this file is contained, starting with the root directory and separating 
the directory names with slashes.

     The term "pathname" is actually a combination of directory names plus
the file's name, telling how to reach the file in the tree.  Here are some
examples of pathnames:

          /usr/joe
          /etc
          bin/rm
          /usr/alice/homewrk
          alice/homewrk
          assigns/2.old

Notice that every "name" in a pathname is a directory's, except for the
last name, which can be a directory or a file name.  Directory names are
separated from each other and from the final file name by a slash.

     Pathnames do not necessarily have to start with a slash.  If they start
with a directory name, such as "alice/homewrk", then they are called relative
pathnames because their real name is relative to the current working directory.
If a pathname starts with a slash, then it is called an absolute pathname and
it starts with the root directory.

     A simple file name is also a relative pathname.  Thus homewrk is both the
file's name and its relative pathname.

     What is the current working directory?  Every user who is giving commands
to UNIX is "inside" some directory, and the commands that the user types in
usually refer to files inside that directory.  This is the current working 
directory for that user and can be discovered by using the command:

     % pwd

which actually stands for "print working directory".

     When you log in, you are always placed in your home directory, which is
the root of the mini-tree that belongs to you.  In the above examples, alice's
home directory was /usr/alice.  Joe's was /usr/joe.  At Canisius, yours will 
likely be something like

     /mnt/stud/tim

if your username is "tim".

     UNIX uses the current working directory as a shorthand for pathnames,
since it would drive users crazy to have to type in a full pathname every
time they wanted to do something to a file.  If you type in a relative path-
name (one that does not begin with a slash), then UNIX prepends the current
working directory.  Thus, alice could access her files easily by using their
names like "homewrk" and "notes".  UNIX will put her current working directory,
which is /usr/alice, in front of relative pathnames to get names like 
/usr/alice/homewrk and /usr/alice/notes.

     If alice goes into her "assigns" directory, then her current working
directory becomes /usr/alice/assigns, and she can use the names 1, 2 and 2.old
to access her files there.

     The tree-structure directory system of UNIX is like many others in 
computers today, including MS-DOS and the Macintosh.  Directories may contain 
ordinary files or other directories.  In the MS-DOS system, which is used on
IBM PCs and clones, the directory separator is the backslash, so pathnames
look weird to UNIX users:

     A:\csc230\homeworks\jan27.txt

Also, MS-DOS uses a drive indicator to tell which disk drive the file is on.
This takes the form of a letter followed by a colon, like A:, B:, C:, etc.
UNIX has no such drive indicator.

     In the Macintosh world, a directory is called a folder, but the concept
is identical.  Pathnames also exist in the Macintosh, with the directory
separator being the colon.


Moving around in the file tree
------------------------------

     We often use the metaphor of a directory as a room or a space in which
the user is temporarily sitting.  What really happens is that UNIX keeps track
of the current working directory and prepends to it whatever the user types in
when that input is an incomplete (relative) pathname.

     To change the current working directory, use the "cd" command:

        cd assigns

This actually adds "assigns" to the end of the current working directory.  If 
you do pwd now, you should see the directory name "assigns" at the right end. 
We also call this "going into the directory," again using the metaphor of a
directory being a room or a space.

     A directory is called the parent of its children, which are the files and
directories logically located in it.  The parent directory of the current
working directory is always called ".." or "dot dot".  Thus, to return to the
parent directory, type

        cd ..

This is true no matter what your current working directory is -- the parent is
ALWAYS called .. and you can always get back to it by doing "cd ..".  In fact,
you could get all the way back to the root directory by doing "cd .." repeated-
ly.

     Another special file name is

        .

which stands for the current directory, whatever that is.  You can see the 
current value of dot by using the pwd command.

     Dot may not seem particularly useful right now, but it does have some
real uses.  One of them is to refer to a file in the current working directory:

        ./somefile

But doesn't the current working directory automatically get prepended to any
relative pathname?  Yes, except if the pathname is a command, in which case a
complicated search path is taken.  The dot prevents this other search path
from being used and forces the current directory to be used.

     As a convenience to the user, UNIX provides a quick way to specify your
home directory, the one you get when you log in.  Your home directory can be
specified quickly and simply by the tilde character.
For example:

     % ls ~

lists the files in your home directory, no matter where you are.  You can also 
use this in a pathname, such as

     % cat ~/junk

As an extra convenience, you can refer to any other user's home directory by
the tilde if you put their username after it:

     % ls ~joe

This lists joe's home directory, if it is open to public inspection.


Making and destroying directories
---------------------------------

     To make a directory, use mkdir:

     % mkdir myclass

This makes the directory in the current working directory.  Remember to "cd
into" that new directory if you want to start putting stuff in it.

     Directories can only be destroyed if they are empty.  To remove an empty 
directory, use rmdir:

     % rmdir myclass

WARNING:  We are now about to tell you a little trick that may make things go
faster for you, but which may also ruin your semester!  USE WITH CARE!
There is a shortcut for deleting a directory that is not empty.  In fact,
the directory may contain other directories, each of which is non-empty.
Here's how to do it in one fell swoop:

     % rm -rf somedir

The options are: r -- recursive, delete all subdirectories recursively; and
f -- delete all files without error messages.  IF YOU DELETE A DIRECTORY (or
a file) YOU CANNOT GET IT BACK!  So please be careful!!!


More about ls
-------------

     The ls command is intimately connected with directories, since it shows the
contents of a directory.  If ls is used alone, then it prints out a list of the
files in the current working directory:

     % ls

The format of the output is usually a multi-column display of the names only.
Sometimes it is useful to see quickly which items are directories and which
are files.  To do this, use 

     % ls -F

This puts a slash after each directory name.  There are a few other characters
that it uses, such as * to indicate which files are executable.

     Another useful option on the ls command is -l which stands for long
output.  This prints out not only the file names but a lot of other information
as well.  Let's look at an example:

        -rw-------  1 meyer        3713 Jan 11 14:25 syllabus.dit
        -rw-------  1 meyer        2393 Jan 11 12:30 termpaper.dit

The first string of characters is the privileges on the file, which will be
discussed later.  Next comes the number of links, followed by the username of
the owner. The four-digit numbers are the file sizes in bytes.  These files
are 3,713 and 2,393 bytes long, each.  Next comes the date and time the file
was last modified followed by the file names.

     You can also use ls with an absolute pathname to list what is in that
directory.  Here's an example which lists the contents of the root directory:

     % ls /

You can combine options with filenames:

     % ls -F /usr/alice

     If you give a single file name after the ls command, UNIX will tell you
whether that file exists or not.  If it does exist, it just echoes back the
name.  If it doesn't exist, UNIX will say "filename not found".

     Another handy feature of ls is its recursive option.  Recursion is a
technique which is widely used in Computer Science, and it basically means
apply the same operation to all the subcomponents.  For example, if a directory
contains not only files but other directories, it may be nice to get a complete
listing of everything in the directory.  For this very purpose -R exists as an
ls option.  Here's an example:

     % ls -Rl /usr/alice

This would print

     drwx------  1 alice         512 Jan 11 14:25 assigns
     -rw-------  1 alice         239 Jan 11 12:30 homewrk 
     -rw-------  1 alice        5732 Jan 11 12:30 notes

     assigns:

     -rw-------  1 alice       10352 Jan 11 12:30 1
     -rw-------  1 alice          76 Jan 11 12:30 2
     -rw-------  1 alice          57 Jan 11 12:30 2.old

Notice that a directory has a "d" at the front of its long listing.  Also
notice that the R and l options may be combined after a single dash in the
ls command.  An equivalent formulation would be  ls -R -l ... but this is
not necessarily.  (Such combining of options is not always possible in other
UNIX commands, so read the man pages.  This is one of the failings of UNIX --
it is inconsistent.)

     Another handy combination of options is

     % ls -RC /usr/alice

which lists the files in multi-column format and also recursively.


Moving files around
-------------------

     In a previous lesson, the "mv" command was shown as the way to rename
files.  Its real name is "move" and that is another of its functions since it
can change the location of a file by moving it to another directory.

     The "mv" command has the following format

       mv name1 name2

UNIX can tell if you are renaming a file when "name2" is not an existing file
or an existing directory.  If, however, "name2" is the name of a directory
that already exists, then the file "name1" is moved into that directory.  It
still retains its old name in the new directory.  Here's how alice might
move her notes into the assigns directory:

     % mv notes assigns

If she wants to rename homewrk, she would do:

     % mv homewrk homework

     Move can also relocate entire directories:  "name1" does not need to be 
the name of a single file; it can be an entire directory.  Again, if "name2"
is not the name of anything currently existing, then the directory would get
renamed.  Here's how alice might rename assigns:

     % mv assigns assignments

On the other hand, she might make a directory called "last-year" and move it
into assigns:

     % mkdir last-year
     % mv last-year assigns


Copying directories
-------------------

     Have you ever started to tinker with something and then wish you hadn't
because you screwed it up?  This happens with distressing frequency in the 
world of computers, and preventing this usually requires making a backup copy
before you start to tinker.

     Directories are often made so that all the files of a related topic may
be clumped together, like all your assignments, or all your notes, or all the
files relating to a program.  To keep things from getting confused, it is a
good idea to make a backup copy of an entire directory, which could be tedious
if you had to copy it file by file.

     UNIX foresaw this need and added some options to the "cp" command.  To
copy an entire directory, use the -r option:

     % cp -r prog1 prog1.backup

It is always a good idea to name your directories in obvious ways.  In this
example, the user wanted to backup the prog1 directory, so they called the
backup "prog1.backup".  This is easy to do in UNIX because the "file extension"
(what follows the dot in the file name) is not bound by silly rules as it is
in MS-DOS.  In fact, you can have more than one dot in a file name.


Linking directories
-------------------

     After you become more advanced in UNIX, you will discover that sometimes
you want a directory to be visible from several different places in the file
tree.  But to avoid wasting space, you do not want separate copies but rather
some way to make a pointer to a directory so that it looks like it resides in
two different places.

     To motivate this, let's use the following scenario.  There is a program
called "bomb" which you've written and documented.  The help documents are very
long so you've put them into separate files in a directory, and the directory
is in the "bomb" directory.  But you also want to gather together all the help
documents for all your programs and put them into a convenient place for people
to see.  (The source code to "bomb" must be kept secret.)

     To make this "ghost" copy of the "bomb" documentation, use the link
command, which is "ln".  When you use "ln" just pretend that you are using
"cp" since they work the same way (but do not have the same options).

     % ln -s documentation ~/HELP/bomb-documentation

In this example, the user is in the "bomb" directory, where there is another
directory called "documentation".  The user wants to copy this directory to
the HELP directory, but also to give it a different name so as to distinguish
it from other documentation.

     The -s is a required option.  It stands for "symbolic" because this kind
of directory linking uses what is called symbolic links rather than hard links.
This is kind of an advanced topic which is beyond the scope of this tutorial.