Crowdsourcing and Variant Digital Editions – some troubles ahead

Projects like UCL’s Transcribe Bentham and New York Public Library’s What’s on the Menu? have done groundbreaking work in engaging the public to transcribe their manuscript collections.

Crowdsourcing allows rapid, and it seems high-quality, creation of transcribed data from original documents. Transcribe Bentham has so far created 1,330 transcribed versions, and only a handful have been rejected for a lack of quality. Previously, such scholarly transcription would have taken considerable time and effort, spanning many years.

With notable successes like these, crowdsourcing is now becoming more familiar as an academic tool. But for certain datasets, particularly ones of considerable academic importance, this could bring some problems with crowdsourcing having the ability to create multiple editions.

For example, the much-lauded Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO) are now beginning to appear on many different digital platforms.

ProQuest currently hold a licence that allows users to search over the entire EEBO corpus, while Gale-Cengage own the rights to ECCO.

Meanwhile, JISC Collections are planning to release a platform entitled JISC Historic Books, which makes licenced versions of EEBO and ECCO available to UK Higher Education users.

And finally, the Universities of Michigan and Oxford are heading the Text Creation Partnership (TCP), which is methodically working its way through releasing full-text versions of EEBO, ECCO and other resources. These versions are available online, and are also being harvested out to sites like 18th Century Connect.

So this gives us four entry points into ECCO – and it’s not inconceivable that there could be more in the future.

What’s more, there have been some initial discussions about introducing crowdsourcing techniques to some of these licensed versions; allowing permitted users to transcribe and interpret the original historical documents. But of course this crowdsourcing would happen on different platforms with different communities, who may interpret and transcribe the documents in different way. This could lead to the tricky problem of different digital versions of the corpus. Rather than there being one EEBO, several EEBOs exist.

But this is part of a larger problem. If there are multiple versions of the original content, then which one is the one you use? In fact it’s not only about the content. Which platform works quickest? Which gives the most ‘accurate’ search results? Which one provides enhanced tools for analysis? Which gives the best results for your particular area of research? Where do you send your students? Which one do you cite?

Most importantly, which one do you trust? And why?

In ‘traditional scholarship’, different editions of original documents would be published at, for example, 50 year intervals, and it would be part of the scholarly workflow to review and criticise such editions. The complexity and proliferation of digital resources radically changes this – not only are there more digital resources but the knowledge and skills needed to critically analyse a resource are considerably widened out.

At the moment, there are no immediate solutions for these challenges. But it’s clear that the potential of the Internet continues to fracture existing practices of scholarship – despite the care, attention, and research intelligence that has gone into creating EEBO, ECCO and their various platforms, the potential for academics, funders, publishers to push forward and develop new digital ideas mean that the notion of the Internet as a place where traditional scholarly practices can simply be repeated continues to disintegrate.

(Thanks to Ben Showers for reading over this)

Comments (6)

New Community Collection Project looking for submissions

‘How easily can treasure
buried in the ground, gold hidden
however skilfully, escape from any man!

Seamus Heaney (transl.) Beowulf

A new exemplar community collection is now live: Project Woruldhord.

The project  is trialling the processes and the community contributed collection (‘CoCoCo’) software being formed by the RunCoCo project.

The project is trying to collect any material that would be of help to people who wish to find out more about the Anglo-Saxon period of history and the language and literature.

The project is looking for images, audio/video recordings, handouts, essays, articles, presentations, spreadsheets, databases, and so on.

In particular it is hoped teachers/researchers will contribute teaching material they are happy to share with others.

The most important page to get started is:
http://poppy.nsms.ox.ac.uk/woruldhord

This takes you through the simple to use submission process where you can upload your object and provide some basic information about it.

If you have any questions please email the project:  woruldhord@oucs.ox.ac.uk

Comments

Different Forms of Crowdsourcing

The British Museum’s ‘Wikipedian-in-Residence’, Liam Wyatt, recently gave a talk to JISC on some of the work that the British Museum and Wikipedia were doing together.

In particular, Liam focussed on the Hoxne Challenge, a one-day event organised at the British Museum at the end of June 2010.

Hoxne Hoard - British Museum

Rather than the usual model of building up an article slowly over time with geographically dispersed contribuitors, this event brought together numerous experts and enthusiasts to see if they could construct a high-quality in-depth article on a particular topic.

The topic was the Hoxne Hoard, a discovery of late Roman gold and silver coins, and other previous items in 1992,

The team included various experts from the British Museum plus interested Wikipedia enthusiasts, some of whom attended the event, some of whom were online.

The result, after six hours of editing, was an incredibly detailed article on the Hoxne Hoard, fully referenced with 112 footnotes.

Such a process suggests a different way of approaching crowdsourcing – rather than indiviudals donating indiviudal pieces of digitised material or related metadata, the individuals worked as a team, structuring up their knowledge and expertise according to the basic rules relating to the creation of a Wikipedia article.

While the process could not be used for every single culutral item, in certain contexts it could be incredibly powerful of building up knowledge.


For much detail on the success of the event, have a read of Liam Wyatt’s blog post

Comments