|Home » Learning Curve
(Originally published at radsoft.net August 2001.)
'Unix is not so much an operating system as a way of thinking.'
- Old Computer Science Proverb
First off, Unix has only one root, and it is denoted by a forward slash, not a backward slash (yes, Billg just wanted to be different). But, contrary to the scheme used by CP/M and thereafter adopted by MS-DOS, there are no drive letters. All physical drives are mounted and mapped under the same single root.
Probably the most basic command in the Unix arsenal is ls. When you try to remember what all these cryptic commands mean, try to remember that Ken Thompson, architect of most of them, was extremely cryptic, both by nature and for a very good reason: they were using teletype terminals back then and wanted to minimize output. Also, once you've got to know them better, you will appreciate their brevity: it means far less to type.
ls is the directory list command (that's right - pronounce 'ls' as 'list' and you'll get the hang of it). How the ls command is implemented varies from system to system, but at the root of ls is the essence of the Unix file system: the inode.
It is essential to appreciate the elegance of the build-up of the Unix file system: directories do not contain file information and other junk as do MS-DOS directories; they only contain inodes. The inode itself is an index into file information which is housed separately. All a file entry in a directory has is the file name itself and its inode number.
Unix does not admit of file extensions either, for that matter: if you want a dot ('.') in a file name - fine. If you want several in a row, fine too. (The only limitations are for '.' and '..' which always represent the current and the parent directories.)
It is likewise important to understand that because the Unix file system is built up by inodes, that the name given to any file is relatively unimportant. The one-on-one correspondence between a file name and a disk allocation found on Microsoft systems is replaced by a one-on-one correspondence between the inode and the disk allocation on Unix.
A file name is merely a link to an inode; and because the file name is merely a link, any number of similar links may exist. Thus it is perfectly possible (and quite common) for any one single physical file to have different names found in different directories in a Unix file system. The file itself will continue to exist as long as at least one link - one file name - is associated with its inode.
Unix was from the outset a multi-user system, so its file attributes take this into account. A typical listing for a Unix file will look like the following:
Where the initial character is set to 'd' if the file is a directory (Unix does not distinguish between files and directories - they're all files) and where the next three groupings of 'rwx's are for 'user', 'group' and 'other'.
'user' - the owner of the file - the user who created it.
'group' - the user group the owner belongs to.
'other' - everybody else.
Only the 'user' (the owner) of a file can change these attributes (by using the chmod ['change mode'] command - more on this command later).
Unix also admits of three file times:
atime - time the file was last accessed.
ctime - time the file was created.
mtime - when the file was last modified.
Back to ls
Armed with all that, it might be possible to go back now to that most basic of all commands - ls.
By default ls will list very little. What you need to do is check which command line switches work on your system, and what you will run into can be nothing short of amazing. ls should be able to list both the file names, their inodes, their attributes, their atimes, ctimes, mtimes and a whole lot more.
As the directory architecture is not burdened by a static file entry structure of fixed size, any amount of data at all may be contained in an inode. The inode will in addition contain references to the information needed by the drivers to actually access the file, in other words its direct disk locations. Smaller files have a number of directly accessed sectors listed, while larger and larger files use a more and more indirect approach. Further, the Unix file drivers are written so as to avail themselves of 'lazy write', that is the controller itself will tell the driver where it is located after any completed operation, and the driver will then figure out what pending operation is closest by and send it on to the controller - all in an effort to speed up overall disk operations and reduce disk wear and tear.
The most universal form of expanding the ls command is:
But there are additional switches for listing the inodes and all the rest of that nifty stuff.
Where You Find Stuff
As Unix has only one file system root, all physical drives must be mounted under that same root - or most often farther down the hierarchy tree (even the floppy drive falls into this order of things). It's quite common to have the most essential files on the primary drive: this might be the only drive accessible at boot, as the system files themselves must see that all additional drives are mounted as the boot proceeds.
There are as well a number of standard directories found right under root which might be interesting to look at.
||Can often contain most of the executables ('binaries') used in the basic system setup.
||Contains the passwd file and later derivatives.
||Often a separate drive mounted onto the primary drive, this directory can begin the tree for user-specific files. Sub-directories such as /usr/bin are common.
The actual operating system files might be found either in the root itself or in one of its direct sub-directories.
Common Command Stuff
Now take a look at some more basic Unix commands.
mv is the move command: it's an actual executable file which takes care of moving files from one directory to another. As with MS-DOS systems, moving normally does not entail moving the file itself, but only its references. The only time mv will actually deal with file copying is when the source and destination are not on the same physical disk.
cp is the copy command: it will actually perform a physical file copy. The file will exist in two separate places.
rm is the remove command and is used for deleting files only.
chdir changes your working directory; mkdir makes directories; and rmdir removes them.
pwd prints your working directory (on your screen).
ln creates an additional link to an existing file (an additional file name in a directory).
Lists the contents of a directory and attempts to figure out what file types these files have. As Unix does not rely on file extensions to determine file type, the existence of 'magics' in these files (as also found in Microsoft files) becomes all-important.
The BPOC on a Unix system is the superuser or root. Traditionally this is user 0 in user group 0, although security may be tighter on your box and be configured a bit differently. The superuser will have access to everything on the system even though this person's login is governed just as any other.
Basically any user defined as user 0 in user group 0 will have the same status, and just as with the file system, user names need not be unique.
In the early years of Unix all the user login information was contained in the file /etc/passwd. This proved far too easy to crack as time went on, necessitating more and more sophisticated schemes. Dennis Ritchie personally constructed 4,096 faulty variations of DES for use here.
When first a user was created, the system chose one of these, encrypted the user's password, and put both in /etc/passwd. As the algorithms were deliberately 'faulted' the process was not reversible.
(Don't laugh: IBM mainframes all the way up to but not including MVS/XA - the mainstay of the computing world for almost thirty years - used a simple one-on-one encryption system which was reversible. Cracking a user account took a matter of seconds.)
Users of stand-alone Unix boxes can operate as the superuser if they wish; however this is normally not to be advised, especially not when connected to the Internet. Processes created by the superuser account carry the same status and may ease the ability of intrusion programs to access the system.
One of Ken Thompson's cornerstone concepts was that processes be unable to communicate with one another save to know of each other's existence. Classic Unix has no inter-process communication whatsoever. When one process - such as the user's login - wishes to create another process, something called fork starts to happen.
The 'fork' is as a fork in the road, so to speak. The process wishing to create a new process - e.g. you want to start the ls command - first creates a copy of itself, and then this copy morphs into the command you issued. Your initial process will then wait and hang on the completion of the 'child' process. While Microsoft Windows does not officially admit of a family hierarchy between processes, classic Unix - as well as MS-DOS to a great extent - does.
ps is the process status command. Just as ls, it will admit of a plethora of command line switches which will vary from system to system.
With classic Unix it is possible to trace one's own login all the way back to process 0 or the process table, and every process has a number, or process ID. ps will list the process IDs for the parent processes as well.
kill is the command used to kill processes - with varying amounts of prejudice. kill -9 is traditionally the meanest kind of kill available. A cute trick is to find your own login process number in the ps table and then kill it (can you guess what happens?).
As all processes hang and wait on other processes, so does the process that processed your login. It simply waits until you decide to log out again. Your logging out is the termination of the process which the login created for you after you validated your ID and password.
When login notices that this process has exited, it continues execution itself - looping around and displaying a new login screen.
telnet can be used to log in to remote systems. Provided you have superuser rights on these remote systems, you have the ability to kill any processes running on them - including other user logins. A quick look at the ps process table will tell you immediately who is logged in.
nice is the command for creating processes with a priority other than the default. Priorities are from 20 to -20, with -20 being the highest. Normally processes are created at priority 0 - right in the middle. If you want a process to use very little CPU time, you give it a positive priority; if you want it to use lots of CPU time, you give it a negative priority.
Getting In & Getting Out
The most basic Unix user exercise is to log in to your system and successfully log out again. To log in you must validate your account with both an ID and a password; to log out you need only to hit Ctrl+D. Ctrl+D means 'end of file' in Unix, and 'saying' this essentially means 'no more input', so your login process exits (Microsoft Windows will admit of a similar behavior: try Alt+F4 - 'close window' - on the Windows desktop and see what happens).
Next you might try navigating around the file system. Your login might put you anywhere to start with; get yourself up to the root ('/') by successive 'chdir ..' commands. Then list ('ls') what directories you have there, and take a peek in each. It can be quite a complicated tree.
The beauty of Unix is how its architects insisted on everything being done correctly and cleanly. For example, Ken Thompson once quipped: 'Keep your hands off the drivers!' - an admonition which the developers of Internet Explorer have ignored.
The concept of the shell as a command interpreter was also adhered to in a precise way. Contrary to the absurd way things are done on Microsoft systems, Unix command interpreters do not tell you the time or copy or move files or percolate your coffee - they only interpret commands. And thus they are many times more powerful.
The original Unix shell - sh - was written by Steve Bourne. Successive shells are the Berkeley C-shell - csh - and the Korn shell - ksh. You can specify in your login script which shell you wish to start with and may at any time invoke another. Even in its most basic use, Unix and its shells run rings around anything that can be done on a Microsoft box - NT+ (cmd.exe) or otherwise.
Writing shell scripts is for many an art: many tasks can be performed in minutes with a shell script instead of taking the time to write, compile and link code. The Unix shells all have control flow - branching, looping etc. - to make this very easy indeed.
Pipes were invented by Doug McIlroy, a colleague (and officially the boss) of Thompson and Ritchie. Unix pipes work a lot differently from Microsoft pipes, which are in comparison clumsy and ineffective. The beauty of the Unix implementation is that the shells use RAM buffers and not the absurd temporary disk files used by Microsoft, so that operations proceed swiftly and effectively.
The whole idea of Unix is to take the tools you have on disk and find ways of combining them in a pipeline or shell script or both to get the effect you want.