Tuesday, 2 February 2010

De-anonymizing social network users by using browser history to determine group membership

Researchers from isecLab, an international academic collaborative institution, have described a practical way for a third party website to de-anonymise social networking users who visit the third party site, based on their membership of groups in the social network (what Heise Security call their "group fingerprint").

It's already well known from Narayanan & Shamtikov's work that users of social networks can be re-identified through their "social graph", their network of connections and contacts.

The isecLab research shows that it's possible for a malicious website to determine which groups a social networking user belongs to, and therefore who they are (or at least their profile name on the social network), by "stealing" a visitor's browser history to check which particular webpages they went to on the social networking site (my emphasis):

"information about the group memberships of a user (i.e., the groups of a social network to which a user belongs) is often sufficient to uniquely identify this user [when visiting web pages from third parties], or, at least, to significantly reduce the set of possible candidates. To determine the group membership of a user, we leverage well-known web browser history stealing attacks. Thus, whenever a social network user visits a malicious website, this website can launch our de-anonymization attack and learn the identity of its visitors…

about 42% of the users that use groups can be uniquely identified, while for 90%, we can reduce the candidate set to less than 2,912 persons."

Or put another way:

"using history stealing, an attacker can probe the browser history of a victim for certain URLs that reveal group memberships on a social network. By combining this information with previously collected group membership data from the social network, it is possible to de-anonymize any user (of this social network) who visits the attacker’s website. In some cases, this allows an attacker who operates a malicious website to uniquely identify his visitors by their name (or, more precisely, the names used on the corresponding social network profiles)…

The approach presented in this work allow a malicious user to launch de-anonymization attacks against a large number of victims with relatively little effort. Whereas history stealing by itself is often not enough to identify individual users, combined with the misuse of group membership information stored in social networks, it becomes a critical weakness."

Why does it matter? Such attacks can facilitate targeted phishing attempts, social engineering efforts to spread malware (with personalised messages), and - bearing in mind that in the recent China hacking attacks on Google and others it seems some people with privileged access were targeted through their social networks -

"many people in political or corporate environments use social networks for professional communication (e.g., LinkedIn). Identifying these “high value” targets might be advantageous for the operator of a malicious website, revealing sensitive information about these individuals. For example, a politician or business operator might find it interesting to identify and de-anonymize any (business) competitors checking her website. Furthermore, our attack is a huge privacy breach: any website can determine the identity of a visitor, even if the victim uses techniques such as onion routing [14] to access the website – the browser nevertheless keeps the visited websites in the browsing history."

Note that these attacks probe for whether particular known webpage addresses are in the browser history; they can't just access the full browser history. So the attacker would first have to get info on groups' URLs from the targeted social networking site(s), then get members of the social media site to visit the attacker's own site to check from their browser history which groups the user has visited:

"Safari on both Mac OS X and Windows achieved the best results in our experiments: A history stealing attack with 90,000 tests can be performed in less than 20 seconds. Chrome is about 25% slower, while Firefox requires between
48 and 59 seconds, depending on the operating system. The slowest performance was measured for Internet Explorer, which took 70 seconds to probe all pages. Nevertheless, even for Internet Explorer, we could probe more than 13,000 URLs in less than 10 seconds. Together with the results from Figure 3, this show that an attacker can detect many groups of a victim in a small amount of time."

The researchers here used volunteers who were on the XING network for their tests (which has over 8 million registered users), but their other experiments suggested that:

"users of Facebook and LinkedIn are equally vulnerable (although attacks would require more resources on the side of the attacker). An analysis of an additional five social networks [including MySpace and Friendster] indicates that they are also prone to our attack."

What defences are available against these attacks?

Social networking sites could add random tokens to links or otherwise randomise links, but this affects usability e.g. bookmarking links to groups.

Browsers could be improved to try to prevent history stealing but appropriate measures (e.g. applying the same-origin policy for access to visited links) haven't seen adoption by the major browsers or would break functionality on legitimate sites. Disabling Javascript or using extensions like NoScript in Firefox isn't guaranteed to work as some attacks can succeed without it, although it may help to some extent.

Anonymous browsing using e.g. Tor won't help. At the moment it seems that all that can be done by users to protect themselves is to disable or regularly delete browser history, and/or use "private browsing" to not save browser history data when visiting social networking sites (see Firefox private browsing; Internet Explorer 9 "in private" browsing; Chrome incognito mode). Which obviously has functionality, usability and convenience costs to the user.


©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.