|Home » Learning Curve
Of Executables & Icons, Files & Documents
And how they relate on Windows, Unix, and OS X.
First the definitions. Let's keep this simple and not too technical.
- Executable: a file that 'runs'.
- Icon: an image used to represent a file.
- File: an item (directory or otherwise) in the file system.
- Document: a file with a known association.
Windows relies exclusively on file extensions to determine what 'runs'. There are at least four types of Windows executables with the extensions BAT, COM, EXE, and PIF. Even further extensions such as VBS may be classified as 'executable'.
Early PC programs used the CP/M COM format. COM files are statically linked. All addresses are baked in. The COM file must run at its predefined address as all other addresses in the image are dependent on it. Technically the term 'executable' is used to distinguish more modern images from COM images. The Windows command interpreter COMMAND.COM is a COM image.
The 'executable' format, most often using the EXE extension, contains a relocatable image and relocatable addresses. Addresses in the image are relative to the suggested load address of the image itself. On load the system will recalculate the addresses in the image's 'relocation table' to get everything set up right. Additional external references to DLLs are resolved at this time as well. DLLs are normally 'based' to have a predetermined starting address in the address space of the client process; they can be 'rebased' by the program loader if their memory spans overlap.
PIF files accommodate earlier 'PC' images and are actually configuration files for their use. They contain a path to the actual target and other settings.
BAT files are 'batch files' - a series of 'shell commands' for execution, as it were, right from the command line. Microsoft command line shells are typically not very powerful. The NT command line shell (cmd.exe) is better than COMMAND.COM, but it still leaves a lot to be desired when compared to those available on Unix.
In Windows the extension is traditionally stored separately from the filename itself. The 'dot' in a filename with extension is implied and never stored on disk. Sending a Windows user a file named 'filename.' will most likely result in an on disk file named 'filename' (without the dot). As nothing follows the dot, there is no extension; as there is no extension, the dot is not used.
The ILOVEYOU exploit worked by tricking people into opening attachments they received in the Outlook mail client. The message prompted people to open a file called 'ILOVEYOU.txt.vbs'. VBS ('Visual Basic Script') is a new extension recognised by Windows.
By default, the extension VBS would not be shown. In their eagerness to dumb things down, Microsoft found a way to supposedly 'hide' extensions. This was considered more 'user friendly'. Unfortunately Microsoft dumbed the algorithm down too: it didn't hide all extensions, only the last one. What people receiving the Love Bug bomb saw was therefore 'ILOVEYOU.txt'.
But the icon was wrong. As Windows knew the real file type of the extension, it scurried through its Registry and grabbed a 'scripting' icon very similar to the one used by AppleScript. People receiving the file saw a scripting icon instead.
This was evidently not enough to set off the warning sirens. People opened the attachment anyway.
There is no way in Windows to specify an icon or program for a single file. Everything is dependent on the file extension. Like Solaris, Windows can however have a 'default file viewer'. This is normally not set in Windows (whereas in Solaris it's often a text editor). NeXTSTEP, through its shell and 'open' command, will default to TextEdit for opening files of unknown type.
Windows has no special file attributes to approve or disapprove of running executables. Windows goes solely by extension. What extensions are used is a global system variable. Extensions include BAT, COM, EXE, and PIF. The exact order is determinant. If two files with the same names are found at the same location, the one with the higher priority extension is run.
Giving the wrong extension to an executable can result in trouble.
In contrast to Unix, there is no attribute specifically allowing a file to be 'run'. Classic MS-DOS has only six attributes: read-only, hidden, system, volume, directory, and archive. These are bitwise flags stored in the directory entry.
There is neither a flag to allow executables to run nor a flag to stop them.
Windows expects the magic 'MZ' for executables. 'MZ' starts the old 'DOS' header. Programs that run in other environments have further information. For example, Win32 executables have their so-called 'portable executable' header following the 'MZ' header and starting with the four byte magic 'PE\0\0'.
Windows can today store file attributes in a 32-bit 'double word' instead of a single byte. Many attributes are still not used, still not defined. Newcomers include device, temporary, sparse file, reparse point, compressed, offline, not content indexed, and encrypted. There are no flags that allow executables to run. New flags could be added to stop them from running, but they would require a good file system or Registry hack and cannot be generalised.
Windows gets display icons through its Registry. Icons can be in standalone files or buried inside executables, even DLLs. It is the Registry which tells what icon is to be used for which file type.
Given a file, Windows plucks the extension, then traverses the Registry hive found at HKEY_CLASSES_ROOT (HKCR). This is an alias into HKEY_LOCAL_MACHINE/Software/Classes. As the name implies, this is machine specific, and so associations cannot be configured for individual users on the same machine.
Windows now searches for the extension in this hierarchy - the 'subkeys' to HKCR. If it finds a match, it takes the 'default value' for the subkey - this is the associated 'file type'.
Windows now traverses the same hierarchy again, looking for a subkey with the same name as the file type. If this subkey is found, data for the default display icon and the default editor can also be found. It is this information Windows uses to display file icons and to open documents.
Windows distinguishes between extensions and file types so that files with several different extensions can be grouped together. For every file type the Registry can then associate both an executable to open the files and an icon to use to display them.
Although Microsoft's NTFS file system has 'streams', Windows does not otherwise implement anything remotely resembling HFS resource forks.
Unix does not recognise the extension. More to the point - Unix has no file extensions. If a dot is used in a filename, the dot exists. Filenames such as '...' and '....' are fully legal.
Unix determines what runs and what doesn't run by looking at the items' modes. The modes specify access rights for the file's owner, the file's group, and everyone else.
Already here Unix exhibits a concept not found on standalone systems: resource ownership. All files on a Unix system are 'owned' and their owners can allocate or remove access rights.
Access rights are a combination of read rights, write rights, and executable rights.
A program file not marked with an appropriate executable right cannot be run - not even by its owner.
Noteworthy also is that in contrast to Windows and other standalone systems, Unix can prevent users - even the file's owner - from even reading files. The read bit must be set for the appropriate user.
Unix also uses the 'magic' system to determine the exact nature of a file, as exemplified by its program 'file'. File uses the 'magic' database located in /etc to inspect the file contents. File attempts to heuristically determine the type of a file and is not a foolproof method.
What icons, if any, Unix will display for a given file is up to the 'desktop'. Likewise if Unix wants to associate a file with an executable, it is the 'desktop' which will make the association.
Unix cannot establish a 'one on one' association with a specific file.
OS X is a blend of classic beige box Macintosh ideas and Unix ideas. The beige box Macintosh used first the file system MFS and then its successor HFS along with creator codes and file types to determine how a file behaved.
MacOS has no means to allow or prevent executables from running.
Creator codes and file types are both four byte (double word) fields typically filled with alphabetic characters. Apple reserve all lower case combinations.
The creator code must be unique. (Considering the limitations, this is a praiseworthy goal but little more.)
The creator code and file type of a given file are stored in the file system's volume control block in the first half of a 16 byte field with 'Finder info'.
The history of MacOS executables is even murkier than on the PC. The CPU architecture changed; 'fat' binaries shipped; executable code was placed both in the data fork and the resource fork.
OS X launch services contain information for associating executables with both creator codes, extensions, file types, and URL schemes.
OS X also has an 'open with' feature. A file can be tied to an executable through use of so called 'usro' data in the file's resource fork. This setting will override all other settings. If the file loses its resource fork, the association will revert.
After usro data, the creator code and file type together have the highest priority. Lacking these fields in the volume control block, the launch services will go after the file extension and URL scheme.
The Finder info field can also be used to override the default display icon setting for the file. If the flag 'kHasCustomIcon' is set, the shell will look for the icon in the file's resource fork.
The ILOVEYOU exploit worked because millions of users didn't react to seeing the wrong icon for an attachment whose name was displayed as 'ILOVEYOU.txt'. The real extension of the attachment wasn't seen because Microsoft, in their eagerness to make their Windows more 'user friendly', chose a flawed algorithm with which to do this.
Similar exploits on OS X have the advantage of being able to dissemble the display icon as well. Because Windows cannot create a 'one on one' association with a file, Windows users have a chance; as this 'one on one' association is possible on OS X, that system's users don't have the same chance.
Hiding files on Windows or Unix is difficult; with a file system such as HFS which has resource forks, the task is much easier. The Oompa Loompa exploit hid full executables inside its own resource fork and overwrote the executables with its own data fork.
Because OS X isn't a straight OPENSTEP implementation but a succession of concessions to older standalone beige box MacOS ideas, too many methods of determining file type compete concurrently, and the information offered to the user is confusing and contradictory - or deliberately misleading.
The Chocolate Tunnel
Input Managers - The Cure
Peeking Inside the Chocolate Tunnel