About | Buy Stuff | Industry Watch | Learning Curve | Products | Search | Twitter
Home » Learning Curve

3662262

Back to HFS.


Buy It

Try It

(Developers: download the NSWorkspace 'proof of concept' - 2,095 bytes.)

Try this experiment.

  1. Make a screen dump with ⇧⌘4. This creates 'Picture 1.pdf' on your desktop.

  2. Double-click 'Picture 1.pdf' to open it in Preview.

  3. Open Terminal, go to your Desktop, and type in:

    ln 'Picture 1.pdf' 'Picture 2.pdf'

  4. Rotate the image in Preview and then hit ⌘S (Save).

What happened?

Hard Links

Unix is fairly alone in the world with its hard links (that was a hard link you created in the command line above). Actually the name 'hard link' is a bit of a misnomer: Unix admits of what might be called 'multi-linked files'.

Directories contain information about the files within. Unix directories contain only file names and inodes. An inode is a numerical index into a control block on each physical drive called the ilist. All file information - save the file name - is in the ilist. It's impossible to backtrack from data in the ilist to a file name.

Any number of file names can point to the same physical storage on disk. When you issued the 'ln' command above, what happened was this.

  1. The 'ln' program looked in the directory 'Desktop' for the record for 'Picture 1.pdf'.

  2. This record contains the inode for 'Picture 1.pdf'.

  3. The 'ln' program assigned the same inode to 'Picture 2.pdf'.

If you issue an 'ls -il' command in the Desktop directory, you will see that the files share the same inode and that both now have a link count of 2 - there are two links to the same inode. The inode is in the leftmost column; the link count is right before the owner and group.

553824 -rw-r--r--  2 rixstep  staff  3334 Jun 21 18:50 Picture 1.pdf
553824 -rw-r--r--  2 rixstep  staff  3334 Jun 21 18:50 Picture 2.pdf

Unix needs this link count in the ilist information because it's going to have to free this storage when the count gets back to zero. In Unix you never remove a file - you unlink a file name to it.

There's really nothing more to the whole thing because the design here is so simple yet effective (some would say elegant).

HFS+

Hierarchical File System Plus (Extended or + or even 'X') is an outgrowth of the first commercially available file system for the original Macintosh computer back in 1984.

It uses a complex 'B-tree' (B*-tree) algorithm for storing file records in what is known as a catalog file. The catalog file contains three types of records for directories, files, and special 'thread' records that help the system find its directories and files. A HFS+ file record looks like this.

struct HFSPlusCatalogKey {
 UInt16 keyLength;          // 16-bit unsigned integer
 HFSCatalogNodeID parentID; // 32-bit unsigned integer
 HFSUniStr255 nodeName;     // Unicode string
};

The first field keyLength gives the length of the record (which is variable); the second field parentID gives the catalog node ID (or 'CNID') of the file's parent folder (this is 32-bit and so allows for 4 GB file CNIDs on a file system); the third and final field nodeName gives the file's name - which may be up to 255 Unicode characters in length.

You can 'backtrack' with HFS+: given a CNID, you can find the fully qualified ('absolute') path to the file name.

  1. Take the nodeName field in the record.
  2. Look up the record for the parent folder's CNID parentID.
  3. If the parent folder also has a parent folder, look up that parentID too.
  4. And so forth - now put all those names together.

HFS+ can do this without a murmur because it assumes there is only one file name per file information record. But what happens if you suddenly have to support Unix hard links?

Apple Technical Note TN1150 gives the details. It's very complex.

Apple Technical Note TN1150, HFS Plus Volume Format

Hard links in HFS Plus are represented by a set of several files. The actual file content (which is shared by each of the hard links) is stored in a special indirect node file. This indirect node file is the equivalent of an inode in a traditional UNIX file system.

HFS Plus uses special hard link files (or links) to refer (or point) to an indirect node file. There is one hard link file for each directory entry or name that refers to the file content.

Indirect node files exist in a special directory called the metadata directory. This directory exists in the volume's root directory.

The name of the metadata directory is four null characters followed by the string 'HFS+ Private Data'.

\0\0\0\0HFS+ Private Data

The directory's creation date is set to the creation date of the volume's root directory. The kIsInvisible and kNameLocked bits are set in the directory's Finder information.

The icon location in the Finder info is set to the point (22460, 22460).

The inode number returned by the stat and lstat routines is actually the catalog node ID of the indirect node file, not the link reference mentioned above.

Hard link files are ordinary files in the catalog. The catalog node ID of a hard link file is different from the catalog node ID of the indirect node file it refers to, and different from the catalog node ID of any other hard link file.

enum {
 kHardLinkFileType = 0x686C6E6B, /* 'hlnk' */
 kHFSPlusCreator = 0x6866732B /* 'hfs+' */
};

There are two types of files in the catalog file: ordinary 'hfs+' files and new special 'hlink' files. The new 'hlnk' files have records just like their 'hfs+' counterparts, but they don't yield their own CNID when questioned; they yield the CNID of the indirect node file they point to.

All indirect node files are in the folder \0\0\0\0HFS+ Private Data located in the root directory.

As long as a file is not multi-linked, everything is fine; when it gets its first 'hard link' some major shuffling is needed.

  1. Create the directory \0\0\0\0HFS+ Private Data if it doesn't already exist.
  2. Grab the record for the original file (and its CNID) and put it in \0\0\0\0HFS+ Private Data.
  3. Transform the original record from type 'hfs+' to type 'hlnk' and give it a new CNID (and make sure the thread CNID points to this new record instead).
  4. Make a similar 'hlnk' record for the new file name.

And of course HFS+ has to switch this all back the day the link count gets back to 1 again.

  1. Remove the record for the file name being unlinked.
  2. Grab the record in \0\0\0\0HFS+ Private Data and move it back (it keeps the same CNID).
  3. Remove the first and only remaining 'hlnk' record and see the thread CNID now points to the original 'hfs+' file record again.

Pretty complicated.

So why did Preview screw up?

Carbon, Cocoa, & NSDocumentController


What's this?

Carbon still doesn't admit of Unix paths. Carbon, like MacOS before OS X, wants colons (':') as path component separators. Try putting a colon in a file name through the Finder and HFS will store slashes in the file name instead. But slashes are illegal in Unix filenames: they're the path component separators. Before the HFS driver gives the file names back again, the slashes are converted to colons - if the source of the request is from the 'Unix' side of OS X that is.

NSDocumentController controls all documents in OS X (together with NSUserDefaults). When running on HFS+, it creates records which rely on HFS+ catalog file information.

    <key>NSRecentDocumentRecords</key>
    <array>
        <dict>
            <key>_NSLocator</key>
            <dict>
                <key>_NSAlias</key>
                <data>
                AAAAAAF6AAIAAAxNYWNpbnRvc2ggSEQAAAAAAAAAAAAA
                AAAAAAC80Wl7SCsAAAAA8l0EMS54YgAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAfqQbz5js8AAAAAAAAAAP//
                //8AAAEgAAAAAAAAAAAAAAAAAAAAC3B1YmxpY19odG1s
                AAAQAAgAALzRobsAAAARAAgAALz5xw8AAAABABQAAPJd
                AADyUgAAfwYAAH7MAAAcWwACADlNYWNpbnRvc2ggSEQ6
                VXNlcnM6cml4c3RlcDpTaXRlczpSaXhzdGVwOnB1Ymxp
                Y19odG1sOjEueGIAAA4ACgAEADEALgB4AGIADwAaAAwA
                TQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgAsVXNlcnMv
                cml4c3RlcC9TaXRlcy9SaXhzdGVwL3B1YmxpY19odG1s
                LzEueGIAEwABLwD//wAA
                </data>
            </dict>
        </dict>

It is for this reason that a Cocoa document can know immediately when a file name has changed: the _NSAlias lets NSDocumentController 'backtrack'.

But this system also assumes there is but one file name for physical storage on disk. If the file becomes multi-linked, the old file record is moved to \0\0\0\0HFS+ Private Data and a new file record (of a new type) is written to its own directory to replace it.

And NSDocumentController is thrown for a loop.

This also shows up if two programs edit the same file. If Unix paths are used, nothing would happen; but Unix paths are not used - CNIDs are. When the one editor overwrites the file, the CNID is lost - relegated into oblivion - and OS X cries out for help.

The irony, as any OS X user knows, is that the 'Save As' button leads irrevocably to the same file name, leading one to wonder why the dialog is necessary. But NSDocumentController isn't worrying about Unix paths; it's worrying about CNIDs.

Bottom Line

Not all OS X applications will behave as Preview. Not all applications are authentic Cocoa applications. For example, TextEdit (which is not fully integrated into Cocoa) will destroy the multi-link when encountering difficulties with NSDocumentController - this with no regard to why NSDocumentController was having difficulties.

Carbon applications may behave even more strangely.

And it is possible to corrupt your hard drive this way - although the test above with Preview won't.

It's also possible to get even stranger results if the file extensions are removed and the files renamed to begin with a dot ('.').

But NeXTSTEP, the system Apple bought from Steve Jobs, never had this problem because NeXTSTEP ran on the Unix File System (UFS).

HFS was 'retrofitted' to support Unix because of the 'Macintosh' legacy.

Can you force a square peg in a round hole? Why would you even want to try?

Postscript: Elvis and the Rat

The name of the deliberately elusive HFS directory has changed. And so has the method of keeping it under the radar.

The old name was '\0\0\0\0HFS+ Private Data'. The new name is '?\x90\x80?\x90\x80?\x90\x80?\x90\x80HFS+ Private Data'. But there's more.

The old name turned up in the directory nlink count. The new one doesn't.

What Apple are pulling here is the old 'zero inode trick': the file system sets an inode to zero when it's completely unlinking a file and prior to actually removing it from the directory. So Apple give their HFS directory a zero inode - and as it's not in queue for deletion by the file system it remains untouched and ignored by Unix APIs.

And Apple do this for two further files hidden in root: .journal and .journal_info_block. Elvis never left the building.

About | Buy Stuff | Industry Watch | Learning Curve | Products | Search | Twitter
Copyright © Rixstep. All rights reserved.