Friday, 26 November 2010

ICO - fixing broken links on your site to ICO papers or press releases

UPDATE - to find the new URL, just enter the old URL in this form - blogged here.

If your blog or website has links to files (press releases, documents) on the UK Information Commissioner's site, here is a heads up that links from about October 2010 may be broken.

Anyone who read this blog a year ago will know that linkrot - ie sites, especially official sites, breaking or killing links to their site eg on a site revamp - sets my teeth on edge. Yeah, sometimes I just have to channel my inner law librarian. But really, old URLs should not be changed. It's a usability nightmare that is all too common. Surely it would not be difficult to redirect links in the old format to the new URLs on the ICO site. See my suggested (Word-specific) regex below, for instance.

The ICO must have tweaked their site recently. Links to webpages still work; it's just links to pdfs that don't.

For instance, any link to -
http://www.ico.gov.uk/upload/documents/pressreleases/2010/response_to_moj_dpframework_press_release_06102010.pdf

won't work now; the link should instead be to -
http://www.ico.gov.uk/~/media/documents/pressreleases/2010/response_to_moj_dpframework_press_release_06102010.ashx

In other words, in the URL change "upload" to "~/media" and change "pdf" at the end to "ashx".

From my very limited testing, links (even to PDFs) in the old format to files published before about October 2010 do still work, although changing them to the new format also won't break them - but note that I don't guarantee either point!

Luckily, most of us shouldn't have too many links to PDFs on the ICO site from about 1 Oct 2010 to have to fix.

I updated the ICO links on this blog manually. Below is what I did, in case it helps anyone else who uses Blogger as their blogging platform; the same method may be assistance in relation to updating links for the recent europa.eu changes (see below) as well as for any other blogs you might need to update "en masse" (ish) on Blogger in future -

  1. The Blogger Data API doesn't support the q text search parameter - else I'd have tried coding something in Java or Python to find and download the relevant posts, update the links and re-upload them.
  2. So you can instead try downloading the XML file of all blog posts (Dashboard > Settings > Basic, it's the Export blog link against Blog Tools - good to use it for regular backup generally, anyway - then Download Blog).
  3. Open the XML file in eg Word.
  4. Search in Word for http://www.ico.gov.uk/upload/ to find the broken links, or more precisely, in order to note down the titles of the blog posts which contain those links - as mentioned above, it seems you won't have to go back beyond about 1 Oct 2010, but I don't guarantee that.
  5. You could perhaps use regular expressions in Word (tick "Use wildcards" in Word's Find and replace box) to find - 
      (http://www.ico.gov.uk/)(upload)(*)(pdf)
    and replace it with -
      (\1)(~/media)(\3)(ashx)
    - then save the XML file and re-import it into Blogger. But
  6. There's a Blogger problem with importing the edited XML file. I got the dreaded error bX-tjg9ds with a test blog that had just 4 test blog posts. Plus, I'm a bit nervous about trying to export and re-import the entire blog, especially given the error.
  7. So in the end I just searched the downloaded XML file to find all the broken links as mentioned in 4, ie blog posts published in Oct/Nov 2010, then I signed in to Blogger, and in Edit Posts I found the relevant blog posts and updated them manually in HTML view. (Or if you use the excellent Windows Live Writer you could perhaps retrieve each relevant published post from Blogger, edit it manually, then republish it; but I didn't do that myself as I haven't had time to test whether it would republish with the same timestamp or not. I believe it would, but try it at your own risk!)

Note - broken links to the Article 29 Working Party's documents it seems may be fixed by changing (in the URL) -
  justice_home/fsj
to -
  justice/policies
- but again I've not tested that fully, so use that at your own risk. I haven't the time or strength to update links on this blog at the moment (I have 59 links to A29 papers/pages!), but least I have the downloaded XML file to use now.

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.