Friday, 26 November 2010

ICO - fixing broken links on your site to ICO papers or press releases

UPDATE - to find the new URL, just enter the old URL in this form - blogged here.

If your blog or website has links to files (press releases, documents) on the UK Information Commissioner's site, here is a heads up that links from about October 2010 may be broken.

Anyone who read this blog a year ago will know that linkrot - ie sites, especially official sites, breaking or killing links to their site eg on a site revamp - sets my teeth on edge. Yeah, sometimes I just have to channel my inner law librarian. But really, old URLs should not be changed. It's a usability nightmare that is all too common. Surely it would not be difficult to redirect links in the old format to the new URLs on the ICO site. See my suggested (Word-specific) regex below, for instance.

The ICO must have tweaked their site recently. Links to webpages still work; it's just links to pdfs that don't.

For instance, any link to -

won't work now; the link should instead be to -

In other words, in the URL change "upload" to "~/media" and change "pdf" at the end to "ashx".

From my very limited testing, links (even to PDFs) in the old format to files published before about October 2010 do still work, although changing them to the new format also won't break them - but note that I don't guarantee either point!

Luckily, most of us shouldn't have too many links to PDFs on the ICO site from about 1 Oct 2010 to have to fix.

I updated the ICO links on this blog manually. Below is what I did, in case it helps anyone else who uses Blogger as their blogging platform; the same method may be assistance in relation to updating links for the recent changes (see below) as well as for any other blogs you might need to update "en masse" (ish) on Blogger in future -

  1. The Blogger Data API doesn't support the q text search parameter - else I'd have tried coding something in Java or Python to find and download the relevant posts, update the links and re-upload them.
  2. So you can instead try downloading the XML file of all blog posts (Dashboard > Settings > Basic, it's the Export blog link against Blog Tools - good to use it for regular backup generally, anyway - then Download Blog).
  3. Open the XML file in eg Word.
  4. Search in Word for to find the broken links, or more precisely, in order to note down the titles of the blog posts which contain those links - as mentioned above, it seems you won't have to go back beyond about 1 Oct 2010, but I don't guarantee that.
  5. You could perhaps use regular expressions in Word (tick "Use wildcards" in Word's Find and replace box) to find - 
    and replace it with -
    - then save the XML file and re-import it into Blogger. But
  6. There's a Blogger problem with importing the edited XML file. I got the dreaded error bX-tjg9ds with a test blog that had just 4 test blog posts. Plus, I'm a bit nervous about trying to export and re-import the entire blog, especially given the error.
  7. So in the end I just searched the downloaded XML file to find all the broken links as mentioned in 4, ie blog posts published in Oct/Nov 2010, then I signed in to Blogger, and in Edit Posts I found the relevant blog posts and updated them manually in HTML view. (Or if you use the excellent Windows Live Writer you could perhaps retrieve each relevant published post from Blogger, edit it manually, then republish it; but I didn't do that myself as I haven't had time to test whether it would republish with the same timestamp or not. I believe it would, but try it at your own risk!)

Note - broken links to the Article 29 Working Party's documents it seems may be fixed by changing (in the URL) -
to -
- but again I've not tested that fully, so use that at your own risk. I haven't the time or strength to update links on this blog at the moment (I have 59 links to A29 papers/pages!), but least I have the downloaded XML file to use now.

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.

Friday, 19 November 2010

Facial recognition for profiling - by drinks machine

Privacy advocates may be somewhat concerned that a vending machine now exists in Japan, installed in a Tokyo train station, which uses sensors and facial recognition technology to discern a potential customer's gender and age and "recommend" drinks accordingly (based on market research as to the preferences of different ages and gender).

So it offers canned coffee to men (green tea if they're in their 50s), and tea or a sweeter drink to women in their 20s. It even makes different suggestions depending on the time of day.

Sales have apparently tripled following introduction of that technology, and the company involved, JR East Water Business Co (subsidiary of railway company) JR East Co, plans to expand to 500 such machines in Tokyo and neighbouring areas by March 2012.

Talk about biometrics profiling for advertising and marketing! But one can imagine the technology, and indeed individual machines, being used for many more purposes. A sign of times to come?

(What really gives me pause is the go ahead given to Spanish scientists to use silicon barcodes to individually tag human oocytes and embryos for identification - they "aim to develop an automatic code reading system". Are the barcodes going to be removed after birth? Can they be?)

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.

Thursday, 18 November 2010

EU law invalid for interfering unjustifiably with privacy & data protection rights

An EU law requiring online publication of personal data (names of recipients of certain agricultural funds, plus amounts received) was declared invalid by the European Court of Justice, as unjustifiably interfering with privacy or data protection rights under the European Convention of Human Rights / EU Charter of Fundamental Rights, because it required blanket publication of all their names/amounts however much (or little) they received, however often, whatever the period or type of aid, etc.

This case is interesting because it underlines the possibility that other EU laws could be vulnerable to being struck down by the ECJ for undue interference with privacy rights, should a national court be persuaded to refer the matter to the ECJ. Data Retention Directive, anyone…?

(The Lisbon Treaty does make it easier for individuals and organisations to complain direct to the ECJ about certain limited EU acts, but we don't know how that'll work out in practice yet.)

The EU must act consistently with the Charter, including in making the laws they pass. However, the Charter's impact on national laws is more limited. It only applies to member states when they're implementing EU law.

What's more, the UK, along with Poland, weren't happy with the Charter and insisted on a Protocol 30 to the Lisbon Treaty to try to ensure that the Charter won't create new legal rights in the UK or Poland, and won't extend the ability of the ECJ or national courts to invalidate UK or Polish laws / regulations etc as inconsistent with the Charter's fundamental rights. This "opt-out" has been called disgraceful, but it may not be clear yet what the exact legal effect of the Protocol is.

Interestingly, in their recent successful application to have the Digital Economy Act judicially reviewed, one basis put forward in their statement of facts and grounds by ISPs BT and TalkTalk was the disproportionate impact of the Act on rights under the Charter as well as the Convention of Human Rights, and reports are that the judge will allow the review to consider fully all 4 of the grounds put forward - probably in Q1 2011. (ZDNet's reference to the judge waiting for the European Data Protection Supervisor's opinion seems mistaken, incidentally, as his opinion on ACTA and 3 strikes came out a while back, in June 2010.)

Anyway, here in the UK it seems people's personal data can get published on line on government websites without their consent or indeed knowledge, even when there's no law stipulating publication! (Hellooo New Forest District Council…)


The court noted that -

  1. The EU Charter of Fundamental Rights (Wikipedia entry, another explanation, full text) has the same legal importance in the EU as the EU Treaties (since December 2009, when the Lisbon Treaty (Wikipedia entry) came into force).
  2. The validity of the EU Regulation provisions in question here must therefore be evaluated in the light of the Charter, including -
    • data protection - article 8(1) - ‘Everyone has the right to the protection of personal data concerning him or her’, including that personal data ‘must be processed fairly for specified purposes and on the basis of the consent of the person concerned or some other legitimate basis laid down by law’, and
    • privacy - article 7 - 'Everyone has the right to respect for his or her private and family life, home and communications.'
    • (Note - the ECJ has said the Data Protection Directive should be interpreted in the light of fundamental rights under the European Convention of Human Rights anyway, including article 8's right to respect for private life - see Rundfunk)
  3. However, those rights aren't absolute, depending on their function in society, and may be subject to limitations provided for by law which respect the essence of those rights and freedoms and, subject to the principle of proportionality, are necessary and genuinely meet objectives of general interest recognised by the European Union or the need to protect the rights and freedoms of others.
  4. Where rights under the Charter correspond to rights guaranteed by the European Convention on Human Rights, then their meaning and scope should be the same as those under by the Convention (article 52(3)), and anyway the Charter doesn't restrict or adversely affect rights recognised by the Convention (article 53).
  5. This means the case law of the ECHR is relevant when considering rights under the Charter, and indeed generally - notably the ECHR cases on respect for private life and protection of personal data. (Note - nothing new here, in that the ECJ generally considers the ECHR anyway where appropriate.)
  6. "In those circumstances, it must be considered that the right to respect for private life with regard to the processing of personal data, recognised by Articles 7 and 8 of the Charter, concerns any information relating to an identified or identifiable individual (see, in particular, European Court of Human Rights, Amann v. Switzerland [GC], no. 27798/95, § 65, ECHR 2000‑II, and Rotaru v. Romania [GC], no. 28341/95, § 43, ECHR 2000‑V) and the limitations which may lawfully be imposed on the right to the protection of personal data correspond to those tolerated in relation to Article 8 of the Convention."

In this case, the Regulation in question (1290/2005 on the financing of the common agricultural policy) required information to be published online regarding recipients of aid from certain EU agricultural funds - and publication of someone's name and income is an interference with their privacy, so even if the underlying laudable aim was transparency as to the use of public funds, the publication requirement still had to be legal, proportionate, necessary etc.

The law here (articles 44 and 42(8b) to be precise) wasn't valid as it required indiscriminate publication of all those details "without drawing a distinction based on relevant criteria such as the periods during which those persons have received such aid, the frequency of such aid or the nature and amount thereof." The court did say, to stem the possible flood of lawsuits no doubt, that no one could sue for past publication of those details. Going forward, obviously they can't be published in the same way.

There were other Data Protection Directive issues in the case but I won't cover them here.

Aside - the referring German court here actually tried to get the ECJ to rule on the validity of the Data Retention Directive, and on whether the Data Protection Directive prevents websites from storing the IP addresses of visitors without their express consent.

Sadly for those of us interested in these issues, the ECJ said, rightly, that those questions weren't relevant to this case, which was referred to them following lawsuits by fund recipients whose personal data had been published on a website (not by visitors to that site whose IP address had been recorded).

Case - Volker und Markus Schecke GbR (C-92/09), Hartmut Eifert (C-93/09), 9 Nov 2010.

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.

Saturday, 6 November 2010

Google - data retention periods for different services (including deleted data)

From a privacy viewpoint, how long a service provider keeps your personal data is important. It's one of the key data protection principles in the EU that personal data shouldn't be retained for longer than is necessary for the purpose for which it was processed.

When that purpose is served, strictly the service provider ought to delete that data. And again, strictly that includes any duplicates or backups.

Insiders like employees who have access to users' data can be a major risk to data security. Sometimes they can view users' personal data, eg Google systems engineer David Barksdale (maybe more) who was fired for accessing Gmail / Google Voice /chat data - stories in ValleyWag, Reuters, ComputerWeekly (or consider the FIFA passport details debacle, or the position of Facebook employees). If stored data isn't properly deleted when it should be, that may further increase the risk of its being accessed by an insider (or outsider) who shouldn't.

Google Search - different features, different retention periods

I previously compiled a table comparing the retention periods for search data at the main internet search engines.

So it was interesting to see a blog by Google's Chief Privacy Counsel Peter Fleischer which mentioned in passing the data retention periods for logs relating to people's use of certain other search-related Google services, which I don't think I've seen documented anywhere else. Here it is in tabular form:

Google Search service

Data retention period

Search logs 9 months
Instant Search (displays search results as you type) logs 2 weeks; a Google blog post clarifies "we now store Google Instant's partial query data for up to two weeks in unanonymized form, at which time we will delete 100 percent of it. These data retention changes apply only to queries made when Google Instant is active."
Suggest feature - from 2004, available on mobile, now called Google Autocomplete (provides search auto-completion) logs 24 hours

Aside - on Suggest logs, incidentally, see the Telegraph write up of funniest Google Suggest suggestions.

Deleting data from Google services

Leaving aside search features for now, what about when you delete data from another Google service?

Google have recently introduced a new privacy policy, with supplemental privacy policies for some individual products. The general privacy policy says -

Because of the way we maintain certain services, after you delete your information, residual copies may take a period of time before they are deleted from our active servers and may remain in our backup systems. Please review the service Help Centers for more information.

It doesn't give info about how long before residual copies are deleted from servers, and "may remain in our backup systems" suggests they never get deleted from backup!

It would be good if Google clarified this point for each service, and, better still, deleted them from backups too.

Google Tasks

It's interesting to see a specific privacy policy mentioned for Google's Tasks feature (which you can use in Google Calendar as well as Gmail):

"You can delete tasks that you have created. Such deletions will take immediate effect in your account view, although residual copies may take up to 30 days to be deleted from our servers. In addition, every 90 days, if not more frequently, we permanently delete usage statistics associated with your use of Tasks. We retain this information beyond 90 days in aggregate form only."

But, I couldn't find specific mention of periods for deleting residual copies in the case of other Google products eg Google Apps services.

Google Docs

I noticed recently that when I deleted documents from Google Docs, they were still appearing in my Google Docs search results for about 2 or 3 hours thereafter. (And for how much longer "residual copies" stay on Google's "active servers" or backup servers after that, is anyone's guess.)

This "feature" may in fact be quite an annoyance if you delete documents which you no longer need, but then they keep polluting your Google Docs search results for 2 hours after that - especially if you're working to a deadline and need to find other stuff quickly!

Google really ought to sort this kind of thing out if they want to succeed in selling Google Apps to organisations, as it will probably frustrate business users. Quite apart from the data protection / privacy implications.

In addition, for those who didn't know - even when you delete a Google Docs document, it seems Google will never delete any images / pictures / photos linked to in the document. Not unless and until you specifically contact Google Docs support (not an easy thing to do) and ask them to do so. So don't go putting embarrassing pics in your Google Docs!


It's good that for Gmail at least, Google say they make "reasonable efforts" to remove deleted info ASAP (emphasis added) -

"Data retention

Google keeps multiple backup copies of users' emails so that we can recover messages and restore accounts in case of errors or system failure, for some limited periods of time. Even if a message has been deleted or an account is no longer active, messages may remain on our backup systems for some limited period of time. This is standard practice in the email industry, which Gmail and other major webmail services follow in order to provide a reliable service for users. We will make reasonable efforts to remove deleted information from our systems as quickly as is practical."

It would be helpful to know what's the max "limited period of time"; but note that "reasonable efforts to remove deleted information" is not the same as a guarantee of eventual deletion.

Note that there are special contractual terms of service for Google Apps applications, including Gmail etc; and it seems (though that's not reflected in the terms) that there's more reliable deletion in the case of paid-for services like Apps Premier Edition:

"I received verbal assurances from our salesperson that Google always honors client requests, within reason.  E.g. A deleted account is truly deleted, however there is a 5 day 'grace period', where it seems an accidental deletion can be remedied.  Regarding the dispersal of data amongst data centers (triple redundancy), the sales person indicated it may take a week to remove data from all the caches.  So, it was implied within a certain window of time, all data that a customer wishes to be deleted is destroyed, and after that it is truly gone."

- plus customer-selectable email retention settings and archiving periods, although archiving Google Docs for records management (deliberate records retention in archives for compliance purposes) appears to be problematic.

As for third party apps obtained through Google Apps, it's up to the third party entirely what their data retention policy is. You have been warned.

It seems people are still uncertain about other issues like Google Analytics data retention.

TOS & privacy policies galore?

As well as explaining individual data retention policies, the interaction between all these different terms and conditions (and Google's Privacy Principles unveiled in Jan 2010) could be clearer.

For example the Google Apps privacy notice is very brief and refers to the basic Google Privacy Policy page - not even the privacy policy itself - and neither of which links or even refers to the "More on Gmail and privacy" page I quoted from above. This can all be rather confusing to the user.

(There's probably a lot of money to be made by anyone who can produce an app to check and cross check TOS across a single website in relation to different services of the same provider - for consistency, cross references etc.)

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.