|Home » Learning Curve » Developers Workshop
HFS: The Good & The Bad
Is it something to keep?
With the advent of Sun Microsystem's Zettabyte File System for OS X Leopard the discussion opens up again.
HFS - or HFS Plus or HFS Extended or HFSX or HFS Case Sensitive - is Apple's default file system for OS X. The moniker 'HFS' with no further attribute is the traditional file system for the Apple Macintosh. It's not the first Macintosh file system but it's the file system that's survived for the past twenty years or so.
HFS has come under heavy criticism - not in the least from this site - but there are good things about it. This article is an attempt to enumerate, evaluate, and finally weigh both the good and the bad about HFS.
HFS is the successor to the original Macintosh file system called simply Macintosh File System (MFS). Put bluntly MFS was an abortion - it was an attempt to organise data around the 'files in folders' paradigm of the original Macintosh. The goal of the design team was to emulate what had been seen in Alan Kay's research lab for the Learning Research Group of the Xerox Palo Alto Research Centre. There were two platforms to emulate: the Xerox Star and Alan Kay's Smalltalk-80. Both systems used a 'files in folders' paradigm on screen.
MFS didn't attempt to support the on screen hierarchy of the Macintosh by use of a similar model. The file system was not hierarchical. This proved problematical and so MFS was eventually scrapped and work on HFS began. HFS stands simply for 'Hierarchical File System' but it was far from the first: the Unix file system came much earlier and even Unix wasn't the first (although it did take hierarchical file systems mainstream).
Having a file system that in some way mirrors the on screen metaphors of the user platform is of course a decided advantage. [One might go further and speculate the on screen experience should instead be a reflection of the on disk reality. Whoa. Ed.]
There are good things about HFS. There aren't many but HFS does have things other file systems do not.
1. Fast directory listings. HFS sorts all directories on file operations. Meaning the contents of any one directory is already sorted when it comes time to list it. This is undoubtedly a speed boost for HFS file managers such as Finder and Path Finder. [On the other hand these applications can't handle non-HFS volumes.] And although it of course takes longer to complete file operations because directories must be resorted, the time gap here is arguably less noticeable to the user.
2. CNIDs to identify and locate files. HFS doesn't use paths in the ordinary sense [and can't for that matter understand so called 'relative paths']. But through the use of CNIDs (catalog node IDs) it can locate a file even if it moves. The benefit is seen immediately in OS X 'save as' dialogs when the underlying target file has been moved. The Cocoa document controller notes the CNID and the Unix path no longer match and asks the user to resolve the conflict.
3. Creator codes and file types. HFS can store information about creators of files and their generic type in its volume control block. This is part of the data commonly referred to as the 'Finder flags'. This relieves the system in theory from depending on file extensions to identify files. It also helps the system suggest further document editors for files when the default file editor is not found or not wanted.
Anyway: those in a nutshell are the advantages of HFS. As warned they are not many in number.
This is a much longer list. And the length and nature of this list is in large part determined by the agenda of Apple and OS X today.
1. It's not POSIX compliant. POSIX is a loose definition of what a Unix system should do and be capable of. Rumour has it Richard M Stallman came up with the acronym and that it actually means 'piece of shit' + IX. Whatever: there are more flavours to Unix than to Baskin Robbins but after all the scuffles and shuffles the Unix people attempted - somewhat successfully - to agree to and adhere to a standard. And POSIX is that standard.
But HFS doesn't fit into the mould. Apple have done some incredible circus tricks in establishing as much compliance as they can but they admit they're far from being there to 100%. In fact there is little chance they can get any further. There are basic design traits of HFS and the old Macintosh which alienate them more than most other systems from the POSIX standard.
Endemic to HFS 'thinking' is the corollary 'one location for one file name'. The idea the OS X document controller can in fact distinctly locate a file based on its CNID excludes the possibility the same file be represented by different names - paths - on disk. But this trait is very much a part of POSIX and one of the major cool things about it.
What POSIX refers to as 'hard links' are simply other paths (names) for the same physical files. Creating a file is the same process as always but it is possible to create additional names and paths for existing files. As the structure of the POSIX file system is dramatically different the method of creating these 'hard links' will be alien to HFS 'thinking'.
When creating a hard link the file system first retrieves the so called 'inode' of the targeted file. The inode is roughly the counterpart to the HFS CNID (and in fact is 'toll free bridged' by HFS today) but instead refers to a block of volume control data that works rather differently.
The inode is but an index - it's a numerical value. It's an index into the Unix volume control block, one logically contiguous area of a volume. The value of the index is multiplied by the size of the control block for any one given file, giving the offset in the block where the particular file's info can be found.
The difference is the actual Unix directory contains no further data than the name of the file and its inode. As the inode contains all file information - including its physical location(s) on disk - and as the inode is totally separate from the file name itself and the directory in which the file name resides it becomes possible (and even desirable) to create further 'links' with the inode. These further links are referred to as 'hard links'.
Hard links are used throughout OS X and are one of the many undeniably cool things about the system. The standard 'bin' directories on any OS X system are riddled with hundreds of such files. But they're totally incompatible with HFS and HFS only through some incredible imagination stretching and hoop jumping can even attempt to cope with them.
HFS has to establish a super secret directory on every volume with data on hard linked files. This directory changes name over time; the current name is the following. [The question marks are real; the '\x' escape sequences denote hexadecimal character values without glyphs in the default MacOS Roman character set.]
|?\x90\x80?\x90\x80?\x90\x80?\x90\x80HFS+ Private Data|
Unix hard links are called 'multi-links' in HFS jargon and reference to them can be seen when running fsck or a cover such as Disk Utility - the point is they don't work very well: their use can result in corrupted volumes and their use through code layers relying on HFS intrinsics can end up breaking the links themselves - with aggravating results.
Part of the reason Apple keep the 'Unix' part of an OS X drive so closed off is that any attempt to access files there through the graphical applications of the system can break the system.
2. Application specific data doesn't belong in a volume control block anyway. The designers of the original Macintosh - beyond being under a lot of pressure to get their product to market - had a view of the computer as something hermetically sealed. There was no perceived need to set boundaries as would be done in a more robust system. The data needed to manage files at application level could be put into sensitive volume control blocks. Unfortunately this doesn't work out too well with OS X today.
It just doesn't make any sense and is a Bad Thing™ to mix things up in this way. A system exists beyond its chosen 'shell' and file system intrinsics should never be relegated to a user land application anyway. It's just Bad Design™.
Today the 'Finder flags' in HFS volume control blocks even include 'colours' - seven arbitrary such based on three otherwise unused bits in the flags. Needless to say this has no relevance with POSIX.
3. You can't marry widely disparate operating systems and file systems without disastrous results. David Cutler - who likes C but never liked Unix - said an operating system should never be the brainchild of a 'committee of PhDs' but that of one person only. Operating systems are strange and sensitive creatures; when design traits start contradicting one another sooner or later there's an intolerable price to pay.
Ars Technica's John Siracusa once lamented the demise of NeXTSTEP - the system OS X is based on - but then went on to admonish Apple to 'keep up the good fight' and continue compromising and damaging it. A more unique demonstration of insanity in this context has never been witnessed. It's OK to stick with basic Macintosh paradigms (if they hold) just as it's OK to stick with POSIX paradigms but you have to make up your mind. OS X has suffered all along from this reluctance to stick with a single paradigm and make the system simpler, more reliable, and easier to use.
4. HFS never admitted of anything approaching Unix file system paths. HFS uses a colon as a path component separator; Unix systems use the forward slash both as a path component separator and as the start of a 'full path'. And the forward slash at the start of a 'full path' also represents the otherwise unnamed 'root' directory on a volume.
All paths in HFS are full paths; it is not possible to 'slalom' around with '..' and so forth to move from one location to another. HFS doesn't really have a concept of 'current working directory' either. The leftmost component of an HFS path must be an item directly accessible at the equivalent of the 'root' level.
Use of a colon in HFS paths therefore prevents use of a colon as part of a file name - whilst there is no rule against using forward slashes in file names. As OS X attempts to be POSIX compliant at the same time as it runs HFS certain anomalies will occur - anomalies addressed in different ways by Apple depending on the current release of OS X. None of these sidestepping schemes work - all that changes is the ineffective way of not really dealing with the anomalies.
File dialogs prior to Tiger would supplant characters in a file name to keep the file name neutral: neither forward slashes nor colons would be allowed; starting with Tiger all characters seem to be allowed but in fact are not.
At the application level it seems possible to put forward slashes in file names but if the user tries using these file names at a command line disaster will result. And if a file is saved at the application level with such a name it will appear at the command line to have colons and not forward slashes in its name instead.
HFS must make sure the Unix API layer of OS X does not see forward slashes in file names but at the same time cannot save colons in file names itself. Files on disk are saved with forward slashes and not colons; when representing these files internally or through the OS X document controller they appear to have forward slashes and not colons - when accessed purely through the POSIX layer they will appear to have colons and not slashes.
It's a situation begging for destruction.
5. Lots of cruft. HFS today has lots of cruft lying about - and in the volume control blocks too - that simply isn't used anymore. It has five time stamps to the three of Unix; it has a whole slew of 'Finder flags' that are obsolete; in general it has volume control block information - as stated previously - that shouldn't be there anyway because it's got nothing to do with controlling the volume.
As HFS is organised differently there's going to be a significant overhead. HFS has a pointer to a file's parent directory; Unix does not - inode data doesn't know or concern itself about it. POSIX has no 'sharing flags', 'user privileges', unused reserved fields, 'text encoding hints', location in Finder window, screen coordinates for folders, or any of that. Some flags are back in use today (eg 'hide file extensions' uses 0x10) but most of these flags have no relevance anymore.
6. Special folders needed. Not only does OS X need a special hidden folder to store POSIX 'hard links' but it also needs a protected and somewhat hidden virtual disk to support older Macintosh applications and APIs which cannot understand file paths. Every HFS volume contains a special directory at root - '/.vol' - for this purpose. The explanation of how this 'secret directory' works is beyond the scope of this article but 'Carbon' APIs can 'go in there' with their ordinary HFS file identification data (CNID) and come back out with the POSIX file access they need. [Yes it's spooky.]
7. Corrupt volumes. HFS is a lot easier to corrupt than other POSIX file systems because it's so inordinately and unreasonably complex. The old golden engineering rule of 'keep the number of moving parts to a minimum' seems to have been lost on the creators.
HFS volumes have in the past been notorious for breaking down, leading to a disproportional use of third party tools such as TechTool and DiskWarrior. Trying to clean up a fragmented HFS file system has never been a good idea and often resulted in a need to use one of the above mentioned tools again. [Today HFS performs a modicum of this task but only in realtime and only for files that are exceedingly large and fragmented. Rearranging all files on an HFS volume all at once with otherwise standard tools is still not recommended.]
8. Resource forks. Any file system with 'alternate streams' is inherently susceptible to exploit. David Cutler's NTFS used by Windows is such a file system: alternate streams - which can be used to hide malicious code and data - cannot be accessed (or found) unless their exact names are known and even so aren't commonly accessible by the user.
HFS is known to support resource forks but what is not widely understood is that it can support an arbitrary number of additional forks. The resource fork is actually named 'RESOURCE_FORK' and so is accessible through HFS related APIs. Creation of additional forks represents a further security risk. The Oompa Loompa exploit used the security weakness of HFS forks to corrupt OS X machines.
But resource forks as used by HFS are not compatible with any other file system and any reliance on their use makes the transfer of files to other systems problematical. A now well known Apple technical note (TN2034) pointed this out to the Apple third party developers 'going back to school' for OS X and met with an insane outcry from traditional 'MacOS' programmers who couldn't accept having another operating system to work with. The worst of this embarrassing incident was when John Siracusa began an online protest against the technical note. A counterattack led by seasoned programmers and NeXTSTEP and Unix adepts resulted.
Today Apple attempt to instill the belief in users that anything not compatible with HFS is 'Microsoft Windows oriented' when nothing can be further from the truth. Apple's mail program gives users an option to use 'Windows friendly attachments' when these attachments are necessary for any system other than OS X. The HFS resource fork is simply not transportable across the net as TN2034 pointed out. Regardless of whether one intends to stick with a confusing (and confused) file system the use of the Internet today relies on data transfers. Computers running HFS can get themselves and the computers they communicate with in a lot of trouble.
9. 'Yes we can hide it.' Many Apple technologies (and major industrial contracts) are reliant on the ability of HFS to help hide data from computer users - not keep it out of the way as with Unix but literally hide it. One of the 'Finder flags' in HFS CNIDs called 'kIsInvisible' can clue Finder and other related code into not letting on about existing files and directories.
Apple's attempt to stop people sharing their iTunes song stashes through their iPods is predicated on the use of HFS on these devices and on setting the song hives with the 'kIsInvisible' flag. Naturally the great majority of OS X users have access only to Apple tools for accessing the files in their system and as a result see nothing (and hopefully do nothing). But this 'protection' can't really protect anything at all: any access code defiant of the clues in the CNIDs will allow access to the iPod songs and be capable of transferring them.
The Good vs The Bad
Some of the things HFS is good for are also part of what it is bad for. The bad things greatly outnumber the good things. And the HFS file system is both the cause and the effect of other 'bad things' in OS X. Most important for the future of OS X is that the designers 'make up their minds' - or more correctly convince the others that they should do so.
There's hardly a doubt OS X would have made it 'out the door' years earlier had it not been for the protests of traditional 'MacOS' users and third party developers: OPENSTEP was already a viable and marketed system. Had this been possible the world of computing might look a lot different - and a lot better - today.
OS X came to Cupertino before Microsoft consolidated their position with Windows 98, shortly after the release of Cutler's Windows NT 4.0 in the spring of 1996, and years before the releases of Windows Me, Windows 2000, and Windows XP. Had the computing world seen OS X instead of the abortive Microsoft systems it's likely the majority of users would have happily left the Redmond company behind. But as it turned out all the world saw was an iMac running the notoriously arcane and crash prone 'MacOS' whilst NeXT and Apple engineers were forced against their own better judgement to create a 'hybrid' pundits would gradually praise but still criticise as being 'hodgepodge'.
It's impossible to backtrack. One of the primary reasons for hanging onto HFS has been the 'Carbon' API - but this wobbly critter is going to be rightfully jettisoned by Apple in the move to 64-bit computing with OS X Leopard. Together with all the other incompatibilities HFS foists on OS X users it becomes obvious what the next candidate to be sent packing's going to be.
The only way to move is forward.