Wednesday, 27 January 2010

"Personal data" - browser fingerprint, not just IP addresses


Never mind the data protection debates about whether computer IP addresses (with or without notifying the provider) constitute "personal data" or "personally identifying information" (PII) - the US's Electronic Frontier Foundation are showing by their Panopticlick project that it's easy to track you by your browser's "fingerprint". (The title is a play on "Panopticon", the concept of a jail where a prison guard can see all prisoners but they can't tell they're being watched.)

Try it yourself to get your browser's "uniqueness measurement", measured against the browsers of others who've tested it - which will also help with their research into the privacy risks posed by browser fingerprinting (and they will anonymize your info). It seems Internet tracking and advertising companies are already using these kinds of web browser tracking techniques to record and track people's online activities.

The theory behind this is intuitively obvious - the more facts that someone knows about you, the more likely it is that they can identify exactly who you are (it's been shown that zip code, birth date and gender combined were enough to uniquely identify 87% of the US population).

In maths terms, to uniquely identify 1 person out of the current world population of 7 billion, you need about 33 bits of identifying information; each fact you learn about a person reduces the "entropy" of their identity (I'll spare you the formula! As the EFF explains, bits of entropy are about how large a crowd the information would reveal you within. 10 bits of identifying information would allow you to be ID'd from a crowd of 2 to the power of 10, or 1024, people; 3 bits of info would identify 1 person uniquely within a group of 8 people, and so on).

The same principle can be applied to web browsers. Every web browser has particular characteristics. When you go to a webpage on the internet, the browser sends information to the web server includes a User-Agent header with some info about some of those characteristics like the browser's name (e.g. Internet Explorer), operating system (e.g. Windows XP) and browser version number (e.g. 3.5.7).

The EFF have found that on average, User Agent strings contain about 10.5 bits of identifying information (5 bits to 15 bits on average), so only 1 person in about 1,500 (which is 2 to the power of 10.5) has the same User Agent as you do. 10.5 isn't much out of 33, there's all those 1,499 or so other people, but if you combine that with other info like geographical location and what browser plugins are installed, it all starts to add up.

Even if you reject or delete browser cookies, even if you hide your IP address using tools like the EFF's Tor, your browser's User-Agent gives away quite a lot of info.

The EFF are now extending their research from User Agents to include other web browser info that can be collected and analysed by web servers, info which together make up the "fingerprint" of the browser:

  • The user agent string
  • The HTTP ACCEPT headers sent
  • Screen resolution/size and color depth (time zone too, it seems)
  • The browser extensions/plugins or addons, like Quicktime, Flash, Java or Acrobat, that are installed, and their versions
  • The fonts installed on the computer, as reported by Flash or Java
  • Whether your browser executes JavaScript scripts
  • Yes/no information saying whether the browser accepts various kinds of cookies and "super cookies"

(plus "housekeeping" information to help them fingerprint the data: cookies for repeated visits; encrypted IP addresses and timestamps - see their privacy policy for full details).

When I tried their Panopticlick, my browser was unique amongst the over 33,000 tested so far (word must be spreading, it was only a couple of hundred when I first tried it this afternoon!):

"Currently, we estimate that your browser has a fingerprint that conveys at least 15.03 bits of identifying information."

Of course, this will only identify the individual browser used, not the person using it, but if you visit a website several times using the same browser, even if you change your IP address, and if the site uses the same sorts of techniques as the EFF are trying in their research, you may well be fingered as the same person, and thereafter tracked, from your browser's fingerprint - especially if it's a site where you login and they can link your real name or identity used on the site to your browser's fingerprint (e.g. because they plant a cookie too).

The EFF suggest some methods that users could try to prevent or reduce this kind of tracking.

It seems we're heading more and more towards "Everything could well be personal data", especially with the ability to link more and more information.

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.