About | ACP | Buy | Industry Watch | Learning Curve | News | Search | Test
Home » Industry Watch

Data Trails

Shocking news that won't shock you.

Rick Rashid is Microsoft VP of research. He comes from Carnegie-Mellon University where together with Avie Tevanian, currently head of software at Apple and formerly head of software at NeXT, he put together the MACH kernel.

Rick Rashid's research teams in Beijing and Cambridge have been working on search technologies for the past ten years, and what with a recent acquisition feel about ready to release trial versions of some of this for Windows users.

The technologies are called Stuff I've Seen, Sapphire, and Memory Lens Browser. As Rashid points out, they could basically track anything - and it's important to see that there is a fine line at best between collating data trails and supplying data for future data searches.

They can save the coordinates of all application windows open at any one time, the web pages that are open, the documents that are being edited, the contents of this constellation of files being edited.

They can even save data such as whether you used the menu or a toolbar to save a file, the embedded links you inadvertently loaded when you loaded a web page, the passwords you sent to your bank or credit card company - the works.

Knowing how easy it is to hack Windows - or any system for that matter when it comes to data on this level - none of the above should be a surprise.

What is instructive is that this research began ten years ago - before the birth of the World Wide Web as we know it today. Yes, the WWW existed before 1995, but these past ten years are the growth of the Internet into what it is today, and very small increments in its user base occurred before 24 August of that year.

Apple are touting the next version of their OS X already. They're trying to get developers to opt in to their 'select' programme where for US$500 you get a preview version of an operating system upgrade that will only cost US$129.

But what's interesting is their blurb on a coming technology known as Spotlight.

'Showing up prominently as an icon on the top right of the menu bar, Spotlight is the first new feature of Tiger that most end users will notice. As detailed on the Mac OS X Tiger Sneak Preview site, it offers fast and intuitive searches across all of the data on your system. Spotlight, and many of the other new features in Tiger such as Smart Folders, is built on an underlying set of system services that collect, update, and index both the contents and the meta-data of the files on your system.

'In a nutshell, every time a file is saved, it is examined for meta-data and content, which is then placed into an indexed database.'

Apple users are already aware of the Finder index files lurking around out there. These files are part of an 'opt in' technology. When looking at info for folders, Finder can tell you there are no indexes and ask you if you want to create them - and if you don't, nothing happens.

And even if you did index some folders, you can always destroy the indexes on your own with a simple Unix command line script.

sudo find / -name "\.FBCIndex" -exec rm {} ;

If you were security paranoid, you could purge them.

sudo find / -name "\.FBCIndex" -exec rm -P {} ;

If you were a real bona file spook, you could use SPX:

sudo find / -name "\.FBCIndex" -exec spx {} ;

In any case, they're easily taken care of - and CLIX already has scripts for this.

But next time around, things might go deeper - in particular, it's this 'an underlying set of system services' and 'every time a file is saved, it is examined for meta-data and content' which makes one wonder.

What kind of files will be created to support this database? Will they still be owned by the login user? What about indexes for other system directories? Will the technology only be used in conjunction with file saves, or will it try to go a lot farther as Avie's buddy Rashid at Microsoft wants to go?

Jeremy Bryan Smith is a former Windows user; shortly after making the following discovery he left the platform for good and now runs Linux.

Smith found a very suspect key in his Windows Registry called 'UserAssist'. The UserAssist key seems to have two subkeys which are registered CLSIDs.


Smith discovered that the data at these keys was both:

  • egregious; and

  • encrypted.

The encryption was the infinitely infantile Caesarian 'ROT-13' which moves letters forward in the alphabet by thirteen notches (and then takes the modulo twenty six).

Once he'd decrypted the data at these keys, he found references to files he'd seen online, to files he'd saved, to shortcuts he'd created on his desktop, to programs he'd run...

This wasn't a preview of Rick Rashid's coming search technologies. This was an out of the box edition of Windows 2000 he'd had since 1999.

All told, Smith found 18,497 [sic] 'log entries' in the one key and a petty 394 in the other, making for a whopping total of nearly 19,000.

Smith searched the web too - and came up with a single German language website which referred to this suspicious key - and this German language site referred in turn to microsoft.com, where a knowledge base article briefly discussed it in the context of a 'test' version of Word 2000 known as the 'Instrumented Version'.

The purpose of Word2KIV seems to have been to cull focus group data. It ran for thirty days, whereupon users had to revert to the ordinary version. It culled data and kept it at the UserAssist key. After the trial period, users were evidently supposed to put this stuff on diskettes or CDs and send it in.

The data would include really detailed and trivial stuff like how you use a command - with menu, shortcut or toolbar - or what commands you use most often - stuff that one can expect Microsoft to have some interest in knowing more about.

But what is indicative is that Jeremy Bryan Smith had ever heard of this version of Word. Further, Smith tested this key on a 'wipe and reinstall' situation just to see what was going on.

As soon as he right-clicked on My Computer, the key came back. Its function is somehow coupled with use of Windows Explorer and Internet Explorer. It was used by the Word team obviously - but it's been silently collecting data on every Windows user's activities - names of files saved, web pages visited - since 1999.

For the past five years.

Google, as always, inspire fear in the hearts of their competitors. Suddenly search is all the rage. And now that Google have released their Google Desktop Search (for Windows only) the race is on, and people are panicking.

But it isn't just Microsoft, Apple, Yahoo, and AOL that are panicking - users are panicking too. In a nutshell, to borrow words from the Apple Tiger Spotlight blurb, things are a royal mess.

Google have had a serious cross site scripting vulnerability for the past two years that no one at Google has ever expressed any interest in fixing.

Jim Ley of Jibbering.com started writing to Google two years ago; unfortunately, Google's mail address for security advisories, security@google.com, bounces all the mail back again.

Ley now has two authentic addresses he can use; he has seen some activity at Google; but both he and others have since been able to poke egregious holes in the search engine's interface - and they're serious exploits as well.

Writes Ley:

'Google, stop releasing products, get all your developers into a room, get some good developers who understand security to explain it to everyone. Then review all your code and sites, get some tests written, get defensive and sort your security out now, before exploits start actually getting used. At the moment it's ridiculously easy to find exploits.

'Users, uninstall Google Desktop, make Google a 'restricted site' in IE so script is disabled. Go to 'tools - security - restricted sites' and add *.google.com. Other browser users do the same. And start looking for different search or email solutions.'

But that's only the murkier background; most of the dirt of today has to do with this new monster known as Google Desktop Search.

Hierarchical file systems are like file cabinets - except the drawers can have sub-drawers, and those sub-drawers can have their own sub-drawers, and so forth.

The key - shockingly enough - is 'intelligence': you're supposed to use something called 'brains' when you chart out your own hard drive topology. And yet it seems as if this minor lesson has been lost on the great mass of humanity.

If you archive a lot of games, you might put them all in a folder called - hold on - 'games'. If you have movie files, you might put them in 'Movies'.

Which is exactly what both Apple and Microsoft do today, and yet - if you believe Microsoft - ordinary users get vertigo if a path contains more than two levels of hierarchy.

If you decide on an 'order' to things - if you weren't born confused - then you should be able to extrapolate your order system if you need to.

If you save things around the house, you're able to find them; if you put things in your fridge, you find them. But on a hard drive? Forget it, right?

Support ticket on an IM:

'OK, go to acme.com and click the download link.'


[Silence - long silence.]

'Did you get there?'

'Yes, I'm downloading now.'

'OK, tell me when the download is finished.'

'OK, it's finished. Now what?'

'Now you unzip the file.'

'Where is the file?'

'You don't know where it is? You just downloaded it!'

'Yeah, I know, but I forgot to look.'

[Silence - long silence.]

'OK, so go back to that URL again and download again.'

'I forgot the URL.'

What do the press say about Google Desktop Search?

Google Spyware? Bad Guys & Spies Using Google Desktop Search
'I suppose I was naive when I cheered the new Google Desktop Search tool... The Google Desktop Search tool poses a security risk to users of public or networked computers according to a new Information Week article. If you use public computers at work or at libraries, internet cafes, Kinko's or the local Mailboxes Etc. store, now you've got to worry that previous users of that public machine, or worse, the business owner or employees, have installed Google Desktop Search on that machine to purposely spy on you!'

New Google Search Tool Poses Security Risk
'Type in 'hotmail.com' and you'll get copies, or stored caches, of messages that previous users have seen. Enter an e-mail address and you can read all the messages sent to and from that address. Type 'password' and get password reminders that were sent back via e-mail.'

Assessing The Security Threat Of Google's Desktop Search
'Google's Desktop Search doesn't come with a warning label, but perhaps it should. 'This isn't a great application for cybercafes or library terminals', says Marissa Mayer, director of consumer Web products at Google Inc.

''It's a double-edged sword', says Richard Smith, an Internet privacy and security consultant. 'It's great for organising. It's a wonderful tool. The downside is it's also a spying tool.'

Netcraft: Phishing Attacks possible on Google >>
Netcraft: New Google Desktop Exploit Discovered >>

No matter your platform, take the advice about Google seriously. Have they fixed their cross site scripting holes? Are there more holes? Are they no better at these things than Microsoft? In such case - watch where you surf.

Consider not doing online banking or any online finances. If you still do it - are you sure you know what you're looking at?

Make sure your vendor gives you a way to 'opt out' of all this hysterical search stuff everyone wants to get on your desktop.

And remember: if Microsoft were not up to no good with their UserAssist - why did they encrypt the data? Why did they try to hide it?

And ponder: have Microsoft in fact ever sent this type of data to a mother ship? With MPA it is feasible, although no one has ever succeeded in fully decrypting the transmissions.

Further transmissions occur all the time with Windows Update - and in the background too. And given these circumstances, isn't it a bit curious that the much touted new version of Windows Firewall still does not have egress control?

You can't use classic secure delete document shredders anymore. Yes, if you 'finalise' a disk - shred all disk and file slack - you're closer to protected, but the caches can still be on disk, lurk there without your knowledge, not have been deleted and/or shredded.

As far as UserAssist goes, there seems to be no way to turn it off. It's embedded way too deep.

One must hope Microsoft, Apple, Google, AOL, Yahoo, and the rest continue to give users a way to turn these data trails off.

Afterword: a search at microsoft.com for 'UserAssist' yields only the single article referenced by the German language site and Jeremy Smith.

For all practical purposes it remains undocumented.

About | ACP | Buy | Industry Watch | Learning Curve | News | Products | Search | Substack
Copyright © Rixstep. All rights reserved.