Monday 7 February 2011

Digital archives and crowdsourcing

As regular users of newspaper digital archives well know, when a scanned image is turned into text as little as 60% of the resulting article's words can turn up in the right place. It is an irritatant but as the cost of correcting an entire archive is so prohibitive most institutions choose to only do this on an ad hoc basis. However, the National Library of Australia has turned to its users to amend jumbled text in a massive crowdsourcing exercise. This has has seen millions of lines of newspaper text being tweaked thus ensuring more accurate searching.

The Sydney Morning Herald recently reported that the public's help in working on the text has been particularly important over the past few weeks as, with huge areas of Queensland under water, many Australians have been seeking news reports from 1974 - the last time there was flooding on such a massive scale.

A video about the library's project can be see here while an article about how and why libraries should do crowdsourcing appeared in The Magazine of Digital Research. See also how newspapers such as the Guardian engage their readers in similar exercises.