Saturday, 6 November 2010

Google - data retention periods for different services (including deleted data)

From a privacy viewpoint, how long a service provider keeps your personal data is important. It's one of the key data protection principles in the EU that personal data shouldn't be retained for longer than is necessary for the purpose for which it was processed.

When that purpose is served, strictly the service provider ought to delete that data. And again, strictly that includes any duplicates or backups.

Insiders like employees who have access to users' data can be a major risk to data security. Sometimes they can view users' personal data, eg Google systems engineer David Barksdale (maybe more) who was fired for accessing Gmail / Google Voice /chat data - stories in ValleyWag, Reuters, ComputerWeekly (or consider the FIFA passport details debacle, or the position of Facebook employees). If stored data isn't properly deleted when it should be, that may further increase the risk of its being accessed by an insider (or outsider) who shouldn't.

Google Search - different features, different retention periods

I previously compiled a table comparing the retention periods for search data at the main internet search engines.

So it was interesting to see a blog by Google's Chief Privacy Counsel Peter Fleischer which mentioned in passing the data retention periods for logs relating to people's use of certain other search-related Google services, which I don't think I've seen documented anywhere else. Here it is in tabular form:

Google Search service

Data retention period

Search logs 9 months
Instant Search (displays search results as you type) logs 2 weeks; a Google blog post clarifies "we now store Google Instant's partial query data for up to two weeks in unanonymized form, at which time we will delete 100 percent of it. These data retention changes apply only to queries made when Google Instant is active."
Suggest feature - from 2004, available on mobile, now called Google Autocomplete (provides search auto-completion) logs 24 hours

Aside - on Suggest logs, incidentally, see the Telegraph write up of funniest Google Suggest suggestions.

Deleting data from Google services

Leaving aside search features for now, what about when you delete data from another Google service?

Google have recently introduced a new privacy policy, with supplemental privacy policies for some individual products. The general privacy policy says -

Because of the way we maintain certain services, after you delete your information, residual copies may take a period of time before they are deleted from our active servers and may remain in our backup systems. Please review the service Help Centers for more information.

It doesn't give info about how long before residual copies are deleted from servers, and "may remain in our backup systems" suggests they never get deleted from backup!

It would be good if Google clarified this point for each service, and, better still, deleted them from backups too.

Google Tasks

It's interesting to see a specific privacy policy mentioned for Google's Tasks feature (which you can use in Google Calendar as well as Gmail):

"You can delete tasks that you have created. Such deletions will take immediate effect in your account view, although residual copies may take up to 30 days to be deleted from our servers. In addition, every 90 days, if not more frequently, we permanently delete usage statistics associated with your use of Tasks. We retain this information beyond 90 days in aggregate form only."

But, I couldn't find specific mention of periods for deleting residual copies in the case of other Google products eg Google Apps services.

Google Docs

I noticed recently that when I deleted documents from Google Docs, they were still appearing in my Google Docs search results for about 2 or 3 hours thereafter. (And for how much longer "residual copies" stay on Google's "active servers" or backup servers after that, is anyone's guess.)

This "feature" may in fact be quite an annoyance if you delete documents which you no longer need, but then they keep polluting your Google Docs search results for 2 hours after that - especially if you're working to a deadline and need to find other stuff quickly!

Google really ought to sort this kind of thing out if they want to succeed in selling Google Apps to organisations, as it will probably frustrate business users. Quite apart from the data protection / privacy implications.

In addition, for those who didn't know - even when you delete a Google Docs document, it seems Google will never delete any images / pictures / photos linked to in the document. Not unless and until you specifically contact Google Docs support (not an easy thing to do) and ask them to do so. So don't go putting embarrassing pics in your Google Docs!


It's good that for Gmail at least, Google say they make "reasonable efforts" to remove deleted info ASAP (emphasis added) -

"Data retention

Google keeps multiple backup copies of users' emails so that we can recover messages and restore accounts in case of errors or system failure, for some limited periods of time. Even if a message has been deleted or an account is no longer active, messages may remain on our backup systems for some limited period of time. This is standard practice in the email industry, which Gmail and other major webmail services follow in order to provide a reliable service for users. We will make reasonable efforts to remove deleted information from our systems as quickly as is practical."

It would be helpful to know what's the max "limited period of time"; but note that "reasonable efforts to remove deleted information" is not the same as a guarantee of eventual deletion.

Note that there are special contractual terms of service for Google Apps applications, including Gmail etc; and it seems (though that's not reflected in the terms) that there's more reliable deletion in the case of paid-for services like Apps Premier Edition:

"I received verbal assurances from our salesperson that Google always honors client requests, within reason.  E.g. A deleted account is truly deleted, however there is a 5 day 'grace period', where it seems an accidental deletion can be remedied.  Regarding the dispersal of data amongst data centers (triple redundancy), the sales person indicated it may take a week to remove data from all the caches.  So, it was implied within a certain window of time, all data that a customer wishes to be deleted is destroyed, and after that it is truly gone."

- plus customer-selectable email retention settings and archiving periods, although archiving Google Docs for records management (deliberate records retention in archives for compliance purposes) appears to be problematic.

As for third party apps obtained through Google Apps, it's up to the third party entirely what their data retention policy is. You have been warned.

It seems people are still uncertain about other issues like Google Analytics data retention.

TOS & privacy policies galore?

As well as explaining individual data retention policies, the interaction between all these different terms and conditions (and Google's Privacy Principles unveiled in Jan 2010) could be clearer.

For example the Google Apps privacy notice is very brief and refers to the basic Google Privacy Policy page - not even the privacy policy itself - and neither of which links or even refers to the "More on Gmail and privacy" page I quoted from above. This can all be rather confusing to the user.

(There's probably a lot of money to be made by anyone who can produce an app to check and cross check TOS across a single website in relation to different services of the same provider - for consistency, cross references etc.)

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.