Thursday, 11 August 2011

Crowdsourcing

Crowdsourcing is going to be one of the themes of this year's NDF conference - asking the public to help you fix or supplement the data and collections from cultural institutions has certainly moved from being brave three or four years ago to being mainstream today.

That's why two recent stories about crowdsourcing caught my eye. Photo agency Magnum has partnered with Tagasauris, a photo-tagging company ,to encourage more specialist tagging of their archive. As reported in the Independent:

Tagasauris already registers some 2,500 taggers working across the Magnum archive. There is, however, one problem with their work: it isn't necessarily specialised. While these taggers might be able to offer general information, for the most part they lack the kind of expert knowledge that could contribute more meaningfully – the nature of the film used for a particular shot, perhaps, or the back story of the photographer.

Social networking and a bit of competitive gaming are, hopefully, going to get the specialists working:

Magnum hopes to recruit would-be gamers with superior expertise through social networking – it currently has more than 300,000 followers on Twitter and 135,000 Facebookers who "like" its page. Every time there is a new image to be inspected, it will be posted on Magnum's Twitter feed. Participants can share the photos, spreading the knowledge base wider.

...

Carter insists there's yet more to be gained by having contributors compete with one another. When The New York Times photojournalism blog posted an article on the Magnum project, several thousand people contacted the agency. "The question is: how much broader would that response have been if the task was gamified? My suspicion is that there are millions of eligible people who would do it," says Carter.


(This blog post from the New York Times has more details about quality control and linking up the metadata, if such things ring your bells.)

The project is currently limited, according to the Independent, to 50 volunteers - you can sign up to be accepted later on . Interestingly, this Wired article, about Tagasauris tagging that found previously uncatalogued photos taken on the set of American Graffiti, notes that rather than 'volunteers' in the sense we might immediately think of, the work is being done through Amazon's Mechanical Turks programme.

My own favourite crowdsourcing projects are focused on the correction of the muddly text produced when documents/manuscripts are digitised and then 'translated' into searchable text using optical character recognition. The translation isn't by any means exact - even crisp tyepwritten documents will render errors, while 19th century newspapers - which are usually digitised from microfilm, vary from 78% to 98% accurate. 98% accurate sounds great, until you realise that it's not that only two words in every hundred that will be mangled - every character has a 2% chance of be mis- or unrecognised, and every word with a muddled character risks being unsearchable.

So I was delighted to see that massive project is being undertaken to digitise and correct the two weekly magazines that Charles Dickens edited, in time for Dickens' bicentenary in 2012. Appealingly, the site has been released in beta, and volunteers can make corrections as the site continues to be developed.

I signed up last night to have a play - while the actual editing interface is a delight to use, and the instructions thorough and easy to follow, I've yet to find an easy way of completing one page and moving on to the next, without having to go to the homepage of the site each time. Ditto, it took me several minutes to find something I could do - unlike the Australian National Library's newspaper correction project, which I've also contributed to, here you 'claim' a journal issue and it is locked so only you can edit it. It would have been useful to see a list of unclaimed journals to choose from.

But I'm being picky. It's a great project, a 24-page issue feels like a manageable and worthwhile piece of work, and I'm looking forward to it.

2 comments:

barnaclebarnes said...

I know I've been harping on about this since DigitalNZ launched but we have a real opportunity to build some great crowd sourced tagging/correction system based on the API. With an aggregated API feed the cost could be spread across multiple organisations to everyones benefit.

Glen

PS: So gutted I can't make it to NDF this year.

Courtney Johnston said...

I'm gutted too :( We might have a minute's silence for absent friends.

I'm not sure what DigitalNZ's plans are around using the API to crowdsource. It would definitely spread the costs and benefits; the most important (and tricky) bit is sucking the data back into the individual collection systems. I know Brooklyn Museum (and maybe Powerhouse) made some tools to enable this from Flickr Commons.

I think DigitalNZ will be doing a meet-up in the days before NDF, so I'll go along and raise this for you ...