Rixstep
 About | ACP | Buy | Industry Watch | Learning Curve | News | Search | Test
Home » Learning Curve

Spotlight on Spotlight

A brief look at Apple's new search technology.


Apple's new search technology is called 'Spotlight'. It is integrated tightly into OS X 10.4 'Tiger'.

Google were first with a desktop search technology - and suffered greatly at the hands of security pundits. Representatives of the company were made to come forward and admit the technology was in fact not suitable in a variety of common computing situations.

Microsoft also have a series of related search technologies to be previewed this year and slated for release with Windows Longhorn due by the end of 2006.

Spotlight is Apple's submission to the desktop search war. It shipped first with OS X 10.4 'Tiger' released on 29 April 2005.

Metadata

Spotlight is organised around the concept of 'metadata'. Metadata is data about a file, rather than the actual content stored in the file.

Spotlight can organise modification dates, file types, and paths on its own; applications running under OS X 'Tiger' are encouraged to add to Spotlight's capabilities by providing 'Spotlight importers'.

Apple already provide Spotlight importers for a variety of file formats including RTF, JPEG, Mail, PDF, and MP3; custom document formats require new metadata importers of their own.

Spotlight is defined and implemented as a part of the 'Carbon' layer of OS X. Applications can summon the Spotlight search window and conduct queries either though the Spotlight API or through Objective-C/Cocoa wrappers for this API.

Mining

Every time a file is created, modified, or deleted, the OS X kernel notifies Spotlight which will then update its system data for the changed file. Using the launch services, Spotlight first determines the four byte file type and thereafter attempts to find a corresponding importer plugin.

The importer will then read the file and construct a 'dictionary' with appropriate metadata. Spotlight will thereafter integrate the dictionary in its system store.

Attributes

It is the responsibility of the application to decide what information a user will find helpful. The data that are to be extracted from files are assigned metadata attributes.

Spotlight provides a wide range of standard metadata attributes for use by applications.

If the attributes an application needs are not already provided by Spotlight, it's important to see if other software houses have already created similar attributes: these should be used when possible instead of creating new ones.

Security, Privacy

On OS X systems with separate user accounts, Spotlight respects the ownership of user files, even though the system store is shared. Spotlight automatically filters query results to remove files a user is not permitted to use.

Developers should nevertheless be aware of users' privacy and security concerns when extracting metadata from documents. What are the implications of making the file data searchable? Should full content be indexed, or only selected fields?

Developers must also provide users with information about which data will be indexed by their importer plugins, and should consider providing preferences that allow user control over what data is extracted.

Importer plugins normally reside inside an application package; when first copied to their destination, applications are considered 'untrusted'. The first time an application is launched, OS X issues a warning to the user. If the user approves, the application is launched and thereafter considered 'trusted'.

Spotlight only loads importer plugins belonging to trusted applications.

Control

In addition to access through system preferences, the user has several command line tools which can be used to control Spotlight. mdutil is the foremost; following is the general syntax.

mdutil [-pEs] [-i on | off] <volumes>

Following is a list of common commands.

Command Description
mdutil -s <volumes>Displays the indexing status of <volumes>.
mdutil -E <volumes>Erases the local stores for <volumes> and rebuilds them if appropriate.
mdutil -i on <volumes>Sets the indexing status of <volumes> to on.
mdutil -i off <volumes>Sets the indexing status of <volumes> to off.

The user can also issue direct queries with mdfind, check importer schema with mdcheckschema, list a file's metadata attributes with mdls, and import metadata into the system store with mdimport.

Apple

Apple are proud of Spotlight technology, calling it 'a watershed in operating system history' and claiming their 'Tiger' is 'the first industrial-strength operating system to feature a fully integrated, fast, and efficient search across all files on a system'.

Apple also go to pains to point out that Spotlight is not an add-on to 'Tiger'.

Make no mistake about it, Spotlight isn't 'bolted on' to the system. It's a completely new search technology that is tightly integrated with a fundamental part of the OS: the file system.

Every time a file is created, saved, moved, copied, or deleted, the file system automatically ensures that the file is properly indexed, cataloged, and ready for whatever search query might be issued - all in the background.

These abilities build on the already impressive capabilities of the journaled HFS+ file system.

Others

Others may not share Apple's enthusiasm. The recent spat with Google Desktop has left many security pundits scarred; these same pundits are now wondering why Spotlight has to index 'per volume' instead of 'per user'; place its store in the root directory of each partition or hard drive instead of in the home directory of each login; and how private and secure Spotlight really is.

If you're one worried about visits in the night from authorities intent on examining your hard drive for compromising data, it won't be enough anymore to shred all your sordid documents and wipe the file and disk slack: Spotlight will have trails that can do you in.

Any member of the admin group can access (and extract) the Spotlight stores at any time - stores used for other users on the same computer.

Spotlight keeps only a limited number of files in the root of every accessed drive - in a hidden directory called '.Spotlight-V100' [sic].

-rw-------    1 root  admin       0 Apr 29 00:00 .journalHistoryLog
-rw-------    1 root  admin  151552 Apr 29 00:00 .store.db
-rw-------    1 root  admin     238 Apr 29 00:00 _IndexPolicy.plist
-rw-------    1 root  admin     260 Apr 29 00:00 _exclusions.plist
-rw-------    1 root  admin     378 Apr 29 00:00 _rules.plist
-rw-------    1 root  admin  151552 Apr 29 00:00 store.db

Even though the directory is marked in a curious way (0600) it's fairly easy to get the files out and investigate them; data belonging to all users is contained therein. Administrators can keep track not only of what users are doing but also of what they have been doing.

And up to now Spotlight has shown a few rough edges: removable drives can be 'excluded' from indexing only until such time as they are inserted again. Removing Spotlight stores from such a drive - and shutting Spotlight off each time - can be a nuisance.

About | ACP | Buy | Industry Watch | Learning Curve | News | Products | Search | Substack
Copyright © Rixstep. All rights reserved.