|Home » Learning Curve
Going Going Gone
File systems and how they work. It's really simple. Really it is.
This article is about file systems, disk drivers, disk controllers, and how it all hangs together. The idea is to present it in accurate terms still simple enough for laypersons to understand. The best place to start is at the beginning - which in this case is as low down as it gets.
The Hard Drive
Hard drives are the principal instance of secondary storage on personal computers. Secondary storage is the term used to describe storage that's persistent - that holds even after a program (or the system itself) stops running. Early mainframes didn't have quite the problem personal computers have - using CPUs with ferrite cores which retained values even without a power source.
Personal computers don't retain values without a power supply, so the use of secondary storage is not only prudent, it's necessary as well.
Hard drives are only one means of implementing secondary storage. Paper teletype tapes and magnetic tapes have also been used. (Ken Thompson and Dennnis Ritchie used paper tapes on early ports of Unix.) Today the personal computer standard is the hard drive.
The hard drive is analog - it's not digital. There's no one on one correspondence between the bits of a file and what's on disk. The hard drive is hopefully healthy enough to accurately read what's previously been written. The methodology isn't important - the fact it's an analog storage device might be.
Hard drives are like gramophone records except they're not. They have tracks but there's no groove that leads from the outer rim to the centre (as with Steve Jobs' favourite record Sgt Pepper). Hard drives have concentric tracks.
And there are more than two sides. A hard drive is not as a gramophone record with one side and then the b/w flip side. Hard drives are more like a stack. Think of a stack of CDs, or a stack of pancakes if you will. Whatever works. The analogy will hold.
Hard drives have more than a 'tone arm' capable of playing a single side at a time. Think of a fork with tynes that reach between each platter and access both the platter above and the one below as well as the uppermost and lowermost platters in the stack.
There are no grooves on a hard drive platter - it's all done through calibration. The drive controller (the fork) can know its position - on which of the concentric tracks. Floppy drives once had holes (more on this another time) but hard drives don't.
Unix systems ordinarily implement lazy write - data isn't written to disk because a user hits ⌘S. The write request goes into a cache instead. The drive controller reports back regularly to the file system with its track position, and the file system chooses a nearby write operation for the controller from its cache. This reduces wear and tear on the hard drive.
Accessing data on a hard drive is a matter of specifying a head, a cylinder, and a sector. The term track is also used.
A head is one of the fork tynes - accessing the upper side of one platter, the lower side of another. A track is the concentric controller location for any given head. A cylinder is the collection of tracks on all sides of all platters at a given distance in on the platter. A sector is a segment of a given track.
Cylinders are zero-indexed, as are heads. Sectors are usually one-indexed. So a specification of 'head:0 cylinder:0 sector:1' takes you to the very beginning of a disk.
Disks usually have sectors that can contain 512 bytes. There have been other sizes over the years but 512 bytes is more or less the norm today. Early floppies had sectors of 128, then 256 bytes.
There are no markings on the hard drive platters to show where one sector ends and the next one begins. This is done with soft sectoring (something the Woz was instrumental in doing for early Apple computers) whereby a specific sequence of values becomes the sector boundary.
The number of sectors per track is variable and dependent on the hard drive in question, as is the number of cylinders. The number of sectors per cylinder is of course the number of sectors per track times the number of heads.
Hard drives conduct I/O (input/output or read/write) on a per sector basis but files systems would rather not. It's just too much to worry about, to do. File systems of today (such as Apple's) normally cluster eight sectors together as the basic I/O unit. Eight sectors together at 512 bytes/sector means the basic I/O unit is 512 x 8 or 4096 bytes or 4 KB. This has certain ramifications.
A file of a single byte still occupies 4 KB of disk space. An empty Unix directory with only the entries '.' and '..' can seem to take only a few bytes but actually takes 4 KB. A file of 4097 bytes necessarily takes 8 KB in storage. And so forth.
The lowest possible logic level is the drive controller. The controller gets commands from the driver. These commands are for reading or writing one or more contiguous sectors on a given track. The controller is also given the storage location in primary (RAM) memory to either pick up or leave off the data.
The file system is layered on top of the disk driver. The file system deals in more abstract commands which are turned into things the driver understands. The file system has to find out where a file is stored or going to be stored. The logic seen in a user interface doesn't apply here. And there's a lot of bookkeeping going on.
A file system must obviously keep track of where files are stored and what sectors are still available. It must 'free' sectors previously used by files that have since been deleted, and it must update the bookkeeping to keep newly used sectors 'hands off' for future operations. So there is a ledger of a sort also stored on the hard drive. The generic term for this 'ledger' is the volume control block.
The architecture of volume control blocks varies with the file system in use. Studying the perhaps simplest (most simpleton) file system for personal computers, it's easy to see what's going on. And this file system is the original PC file system known sometimes as FAT.
FAT is a bit of a misnomer - it's actually the PC volume control block and it stands for file allocation table. PC disks have two - one for ordinary use and the other in case something goes physically wrong with the disk and the first becomes unusable.
[Peter Norton once quipped he didn't know why PCs had two FATs. Bill Gates has at times tried to claim the 'invention' of the FAT as his own. Ouch.]
FATs occur right after the volume sector. They're contiguous. They work as pigeon holes. Each pigeon hole in a FAT represents a cluster of sectors used for disk I/O. The value in the pigeon hole determines the status of the respective cluster. Pigeon holes can either be free for use, represent physically damaged clusters, or be part of a cluster chain that stores file data.
PC directory entries were of 32 bytes. The first 11 bytes were the file's OSI 8.3 name (with the dot preceding the extension implied by the existence of data in the final three bytes). The next byte (offset 11) held the bit-wise file attributes. Such as read-only, system, hidden, archive, volume label, and directory. Only 6 of the 8 possible values were ever in regular use.
An empty unused storage space followed after the attribute byte, then at the end further data was found: the file size and the start cluster. File I/O with FAT worked by picking up the value for the start cluster, accessing it, then jumping into a FAT at the respective offset to see where the next cluster in the chain was located. File I/O followed the FAT cluster chain until it reached the 'end of the line'. This could coordinate with the size of the file as given in the directory entry.
Unix uses a more sophisticated system as it was built for industrial use. The presupposition with Unix file systems is that there can be a lot of small files but when files start to grow in size, they can get really big.
A Unix volume control block contains information about what clusters are used for a file. The first entries denote actual clusters in use. A file of a single byte will have one such entry for one such 4 KB cluster to hold the single byte of the file.
[You can test this by creating a single byte file and then viewing file information in TFF. You'll be told the file takes 4 KB. And it does.]
The first entries point directly to clusters in use by the file, but after a while a Unix file system will allocate further entries not for file data but for pointers to additional clusters with file data. So at this point the pointers are indirect references.
Even this can prove too much after a while, so the Unix file system can allocate further clusters to store further clusters that contain pointers to actual clusters in use by the file - two levels of indirection. And even this can prove too much after a while, so the file system adds another level of indirection. And so forth.
The actual implementation depends on the file system. Apple's HFS differs from Unix file systems in many regards.
HFS vs Unix
Traditional Unix file systems use terminology like iblock, ilist, and inode to describe what's happening on disk.
Unix file systems don't use anything like the FAT model for directory entries either. The FAT model puts the file name in the directory entry but it also puts additional data there; Unix file systems put only the file name in the directory entry - alongside an inode (index) into the ilist (volume control block) that accesses the iblock (file information). The iblocks are of uniform size; a file system can locate the offset of the iblock for a given file in an ilist by multiplying the inode with the iblock size.
All file information (save the name itself) is stored in the volume control block. FAT didn't work like this and many other file systems don't either (and Apple's HFS is a critter apart). FAT by its very nature associated file information with a directory entry that contained this information; Unix file systems hold all this information independent of the file name itself. This means that theoretically it's possible to have another directory entry somewhere else on the same physical volume with the same inode - files can have more than one name.
[This becomes crucial later on when trying to 'delete' files. Unix files aren't 'deleted' - deleting only removes a 'link' from a name to an iblock. The file system keeps track of the number of links to the same physical storaage and only frees the storage when the link count hits zero.]
HFS and its descendants assume a one to one relationship between disk storage and file names. The people tasked with bridging this inconsistency when Apple moved to Unix worked very hard and long and did a pretty good job, but it's not a perfect fit - yet another 'denial of service' attack against Apple's HFS file system was published just the other day.
HFS doesn't have Unix terminology like iblocks, ilists, and inodes. HFS terminology about CNIDs - catalog node IDs. It's not important to grok the inner workings of an HFS file system but it is important to know HFS of today (HFSX, HFSX Case Sensitive, and so forth) have to support Unix intrinsics and to know a bit of the basics.
Support for Unix is established by using inode and CNID interchangeably: Unix APIs that are supposed to return an inode actually return a CNID. And internal HFS routines accept an inode as a CNID. And so forth.
Unix directories aren't sorted - file names are usually thrown in there in no particular order. HFS directories are always sorted. This makes file creation more expensive but accessing existing files faster. Apple use an unusual algorithm often described as a binary tree with three nodes [sic]. Whatever: it supports Unix file systems up to the point where it encounters physical storage represented by multiple names.
Knowing - or at least having a working understanding of - how this computer stuff works is something everyone wants: trying to make sense out of the incomprehensible. But it's easy to trip up on misconceptions.
- Files aren't gone because they're in the trash. They can still be accessed, read, written to.
- Files aren't necessarily gone because the trash has been emptied. Other links can remain.
- Files aren't necessarily gone even when they're completely unlinked. Their data can remain.
Microsoft's 'trash' system is actually better than Apple's. Recycle operations automatically create database entries for the original file paths so the files can be restored. Files are renamed to coordinate with the entries for the original paths. The Apple makeshift counterpart uses the abominable .DS_Store.
Real World Scenarios
The journey from a user decision to save a file to the actual disk controller operations.
- User uses the file system 'save as' dialog to choose a directory and a file name.
- Document controller prompts with the 'are you sure' dialog if the file already exists.
- Document controller opens/creates the file for writing, previous contents removed.
- The file system sets aside a new iblock for the file if it's new.
- The file system initialises the iblock data, setting the file size to zero.
- Document controller contacts client application. One of them writes to the file.
- The write call specifies the file and an address to the data buffer.
- File system finds the disk free space, updates its bookkeeping.
- File system sends chunks of file data (per track) to the device driver.
- Device driver contacts the controller, controller writes the bytes to disk.
The journey from a user decision to delete a file to the point where the file is (hopefully) gone.
- User selects a file and hits a keyboard shortcut to move the file to the 'trash'.
- The file isn't deleted in any way - it's only moved to a new location (mostly unseen).
- The default location is '~/.Trash' - a dotted directory in user root. The file is fine.
- User decides to 'empty trash', file manager unlinks the file. The file is still fine.
- File system decrements the file's link count.
- The storage for the file is freed if the link count is zero.
- Otherwise nothing more is done at this point - and the file is still fine.
Things to Think About
Keep in mind:
- Things are never as easy as they seem. Be thankful all this layered technology (mostly) works.
- The NeXTSTEP/Cocoa classes are a benefit to user and developer alike. They make things easier.
- OS X can sometimes tell you when a file's been moved but it can sometimes get the file name wrong.
- Deleting a file doesn't remove it.
- You have to 'remove' (unlink) all the file names.
- Unix won't find those file names and HFS can only find one.
- A file in the 'trash' is still fine.
- A file in the 'trash' hasn't been removed in any way.
- Even emptying the 'trash' doesn't mean the file is really gone.
ACP Users: More File Stuff
ACP Users: More File Stuff (2)
Learning Curve: Files, Ownership, Permissions, Stuff Like That
Learning Curve: Files, Ownership, Permissions, Stuff Like That (2)