Friday, 7 May 2010

hNews & Value Added News, hNews tutorial - & copyright and licensing options to follow?

As previously promised, this blog post explains hNews and how to implement it on a Blogger blog like this one. Tech, not law, for a change.

While US press agency Associated Press was the prime mover behind hNews, the hNews microformat is potentially useful for other news sources and media like blogs or informational websites because it enables search engines to index news items in a more useful way, by allowing news providers to "tag" their content with machine-readable "metadata" carrying specific types of information about the individual news item and its author etc. It's been adopted e.g. by AOL.

I've tried to make this blog hNews-compliant. You'll notice the small text at the end of each blog post with copyright and attribution info, as hNews requires each "news" item to have separate licensing info, and also the shiny new "Value Added" button at the end of each post which, when you hover over it, provides some information about the blog post.

The AP Registry "employs the hNews microformat to encapsulate AP and member content in an informational wrapper that not only offers publishers a way to prime the content better for search purposes, it also includes a permissions framework that lets them specify how and when their content is to be used online."

AP said "ultimately, it could enable new ways of doing business by offering them the opportunity to let their content flow where consumers want to see it as well as a common way of analyzing use across all platforms."

What's hNews, what does it do, and how?

Basically, whenever AP publishes content, they "mark up" or, if you like, "tag" or "wrap up" the content using hNews, which is what's known as a microformat. hNews was developed by AP with the London-based Media Standards Trust to "prime the content better for search purposes", and it extends the hAtom microformat.

To people, the content reads no differently; the markup text is invisible to the human eye (unless you View Source in your browser, of course).

However, to computers, the content gains new meaning with the addition of hNews markup (metadata, to librarians & information scientists, who will know all this anyway. Note that microformats are not the same as the semantic web).

Their related Value Added News site explains the benefits of using hNews to produce what they call "value added news". Using hNews with an article will make it more machine-readable, including to search engine spiders, providing info to search engines etc about:

  • Who wrote it
  • The title of the article
  • Who it was written for
  • Whether it was edited (and who by)
  • When it was first published
  • How it has changed since publication
  • When it was last updated
  • What key subjects it is about
  • What journalistic codes of practices (if any) it adheres to
  • What usage rights are associated with the article.

(That last item is of course the bit to do with copyright licences and permissions to use the article, if any.)

There's a search engine for "Value Added News" that's in alpha currently, and if you try it you'll see it enables filtering of search results by Author, Tag, People, Organisation and Places. So far the only source it indexes seems to be OpenDemocracy.net. And remember it's still in alpha so it won't work fully as expected yet, e.g. even if there are more than 5 results it seems to only show the first 5 with no way to move to the next page as far as I can see (at the time of writing this).

As other search engines become more microformats-aware, hopefully hNews will be ever more useful, and perhaps be used on more sites.

hNews - minimum requirements

The easiest illustration I can think of is the example news story that's been marked up with hNews with only the hNews items that are absolutely required - namely code (invisible to humans) to indicate that it's hNews-formatted ("hNews hEntry"), then to indicate which bit of the news story is the entry title, which part is the author's name, the publishing organisation ("source-org") and the date last updated ("updated").

And see another example and the Value Added News general howto.

How to add basic hNews to a Blogger blog - tutorial

There's no reason not to use the core elements of hNews as it's quite easy to make Blogger blogs, hosted on Google's Blogspot.com service, minimally hNews-compatible in a basic way.

Below are extracts from my blog template (based on one of the standard Blogger provided templates) showing how I edited it to automatically include basic hNews fomatting for my blog posts. The new bits are in bold red, and as you can see it doesn't take much tweaking to deal with the compulsory required fields. The code is for my specific template but it won't be very different for others.

For the non-technical - to edit your template, login to Blogger, go to the Layout tab, click Edit HTML. Click "Download Full Template" to backup your template first; and note that you do these edits at your risk! You also need to check "Expand Widget Templates" before you can edit it properly.

Blogger seem to have done a lot of the work already by including hAtom markup in their standard templates. As mentioned, the extra bits I've added are in bold red (hnews, item, entry-title) -

<b:includable id='post' var='post'> <div class='post hnews hentry item'> <a expr:name='data:post.id'/> <b:if cond='data:post.title'> <h3 class='post-title entry-title'> <b:if cond='data:post.link'> <a expr:href='data:post.link'><data:post.title/></a> <b:else/> <b:if cond='data:post.url'> <a expr:href='data:post.url'><data:post.title/></a> <b:else/> <data:post.title/> </b:if> </b:if> </h3> </b:if>

And further down, again the bits I've added are in bold red (author, source-org, updated and the new span - plus a comma space so it would read OK with my name and then the blog name); obviously in other people's blogs "Tech and Law" would be changed to something else -

<span class='post-author author dateline vcard'> <b:if cond='data:top.showAuthor'> <data:top.authorLabel/> <span class='fn'> <data:post.author/></span></b:if>, </span><span class='source-org vcard'><span class='org fn'>Tech and Law</span>, <span class='adr'><abbr class='locality' title='London'>London</abbr><abbr class='region' title='England'/><abbr class='country-name' title='United Kingdom'/></span></span></span>

<span class='post-timestamp'> <b:if cond='data:top.showTimestamp'> <data:top.timestampLabel/> <b:if cond='data:post.url'> <a class='timestamp-link' expr:href='data:post.url' rel='bookmark' title='permanent link'><abbr class='published updated' expr:title='data:post.timestampISO8601'><data:post.timestamp/></abbr></a> </b:if> </b:if> </span>

And don't forget to Preview it and of course Save Template once you're happy with it.

These edits mean that blog posts will now automatically include hNews info about the post's title, author's name, and date published.

You can test if it works by entering the URL of your blog post in Google's Rich Snippets Testing service - e.g. see mine.

For the optional extra fields in the body of blog posts, more work would be needed. I just cover copyright licence / usage info, further below.

The ValueAddedNews button

The code for the "Value Added" button at the end of each blog is from ValueAddedNews. I pasted their code for the "Smart" label -

<span class="vab-container"><img src="http://valueaddednews.org/images/vab/vab_100x20.jpg"
width="100" height="20" /> <span class="vab-popup"></span>
</span><script src=http://valueaddednews.org/js/vab.js type="text/javascript"></script>

just above <span class='post-comment-link'> in my Blogger template.

There seems to be an issue with this button, because hovering over the button only displays limited info - it's not showing the info about my blog name and geographical locality (from the red, bold italicised code above), and it should.

I've contacted the hNews/ValueAddedNews people but not heard anything back so I don't know if the problem lies with their Javascript or my attempted implementation of hNews. I suspect that it's them, not me, because the Google tool picks up organisation and locality info perfectly, as it should - see this example.

Copyright licence and usage information

It remains to be seen what usage rights info will be added by hNews users.

There's general information on the item-license, with illustrations, but while the examples given include Creative Commons licences and ValueAddedNews notes that "these usage rights can be written by you, rather than those defined by Creative Commons", no examples have been given of the more restrictive kinds of licences that I suspect the traditional news industry will prefer (e.g. "No permission to copy or do anything else at all") .

Of course, the absence of a licence means (to lawyers at least) the absence of permission to copy. No explicit licence, best not copy.

I imagine people will be drafting licences (and putting them on websites to be linked to), and it will be interesting to see what their terms are.

I chose to use a Creative Commons licence for this blog. However, hNews doesn't let you link to standard usage rights from the blog generally; you have to link from each individual blog post, which is why I have copyright and licence info at the bottom of every post now. As ValueAddedNews say, each blog post or article can have different usage rights, which is useful for the various reasons they give.

In my case, I've changed my template to make usage rights virtually the same for each post - and specified that if people should quote my content, I'd like them to credit me in a particular way. The only difference is that I want them link to the exact URL of the relevant blog post (rather than to my blog generally), so the URL is different for each post.

I added the appropriate code for usage rights to my Blogger template, between

<data:post.body/>
<div style='clear: both;'/> <!-- clear for photos floats -->

and

</div>

The code I used is below; your mileage will vary e.g. you may not want to use the same CC licence as me (and you probably shouldn't if you're not UK-based, as you may need a local licence), and you'll certainly want to change especially the bits in red to suit your own blog -

<p style='font-size:x-small; line-height:90%;'>&#169;WH. This work is licensed under a <a href='http://creativecommons.org/licenses/by-nc-sa/2.0/uk/' rel='license item-license'>Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence</a>. Please attribute to <span class='attribution vcard'><a class='fn url' href='http://www.blogger.com/profile/01409117377874267312'>WH</a></span>, <span class='vcard'><span class='fn org'><a class='url' href='http://blog.tech-and-law.com/'>Tech and Law</a></span></span>, and link to <a class='attribution' expr:href='data:post.url'>the original blog post page</a>. Moral rights asserted.</p>

expr:href='data:post.url' is what produces the URL unique to each individual blog post.

hNews markup in the body of the post

You can also add value to individual blog posts by marking up the content e.g. I've done it as an experiment in one post to indicate, for machine-readability, the people and organisations mentioned in the post (see the data extracted from that blog post by the Google Rich Snippets tool).

For what markup to add to designate different concepts, see the hNews examples and spec.

However sadly this can't be done by just adding code to your template. You have to tediously mark up each individual element (e.g. name of individual) manually, one by one - or at least I did, in my test blog post (view source to see the <span class="vcard"><span class="org"> etc which I inserted for organisation names like ENISA - the Google tool shows all the metadata added.)

I hope that tools to facilitate hNews markup of selected text in a news article or blog post will be forthcoming, otherwise people just won't do it - it's too difficult currently.

More info on hNews

For the technically minded, this presentation is very helpful:

See also the hNews technical specification.

©WH. This work is licensed under a Creative Commons Attribution Non-Commercial Share-Alike England 2.0 Licence. Please attribute to WH, Tech and Law, and link to the original blog post page. Moral rights asserted.