Archive forMay, 2008

Creating Keywords Automatically

There’s an awful lot of interesting ideas to unpack in the Nineteenth-Century Serials Edition (ncse) resource mentioned in a previous posting.

For a start, there is novel to addition to showing results by showing the image reproduction for a search results as well as the OCR’d transcription.

ncse1.jpg

There’s the whole range of partners involved in such a website, indicative of who needs to be involved to run an ambitious digitisation project.

ncse2.jpg

And the related conference brought up a whole host of intellectual questions related to integrating the work into scholarly research.

But what is most interesting is the project’s attempts to automatically give subject keywords to articles within their resource by using natural language processing.

ncse3.jpg

Each article in their website has been processed in two ways. Firstly, to extract persons, places, institutions from the complete data; and secondly to create subject terms (e.g. Arts & Crafts or Emotional Actions, States & Processes ) which relate to each of the digitised articles in the collection.

This is handy for users because it bypasses the tyranny of having to use precise search terms to discover particular articles; and it’s useful for the digitiser because they do not have to go through each article individually and make manual decisions about the subject there within.

I’m not sure it completely works as yet (there are some faults in the results and the interface is not intuitive), but this is a brave and valuable step in trying to really exploit the richness of digitised resources, a richness we have not really tapped into yet.

Comments (1)

Nineteenth-Century resources – Evolution or Revolution?

Members of the JISC team attended a conference to launch the Nineteenth-Century Serials Edition (ncse): a free, online edition of six nineteenth-century periodicals and newspapers.

ncse.jpg

The conference was interesting for a number of reasons, not least because it is in a excellent model for getting groups of end-users involved in discussing and using such resources and actually thinking about the effects they have on their research area, in this case nineteenth-century print.

Patrick Leary, from the Society for the History of Authorship, Reading & Publishing, spoke about the ‘profound changes in the scholarly economy’ that the recent publication of digital resources related to the nineteenth century caused. What, he and others asked, will research look like in five years time.

Others wondered if such changes would be revolutionary or ‘merely’ a huge evolution? There was some gentle disagreement as to whether there should be a push to explore new methodologies (i.e. revolution), or whether we would should make sure that all academics can at least exploit the basics of utilising such resources (the evolution).

In any case, it is apparent that we are really still scratching the surface when it comes to answering these questions.

Comments

In the news: lags and legacies

The launch of a couple of digitisation projects have made the news this week. There’s excitement in the papers over the prospect of digging over some of the most sensational trials in British criminal history as the Old Bailey opens its previously unseen files to the public.

The Old Bailey Online website, published by the Humanities Research Institute, is a collaboration by the Universities of Sheffield and Hertfordshire and the Open University. Funded by the Arts and Humanities Research Council, the trials run to more than 110,000 pages of text and some 120 million words. In addition to the text of the trials, the website provides 195,000 digital images, as well as contemporary maps, images of the courtroom and information on the historical and legal background to the Old Bailey court.

The site is the largest single source of searchable information about everyday British lives and behaviour ever published, said co-director Professor Tim Hitchcock. ‘Besides the desperate drama of crimes punished, the proceedings give us a new and remarkable access to the everyday. History is full of information about kings and queens and wars, but there isn’t much that tells us about the everyday life of ordinary people,’ reports the Guardian.

According to the Times, the website ‘creaked under the strain’ as thousands of Britons, Americans and Australians rushed to search for news of the nefarious dealings of their distant ancestors in Victorian London.

And, who knows, perhaps it might even fill in some more of the jigsaw puzzle that is the Murder of Jean Alexander...

Meanwhile, Origins Network has launched an online searchable database of the contents of 28,000 wills from 1470 to 1856, cataloguing family feuds, dissolute daughters, thieving servants and all possessions great and small ever held dear. It provides, says the Guardian, “a vivid snapshot of social history”.

Finally, the Guardian’s Arts Blog has a lively discussion about online photography archives, following a long and interesting post by Liz Jobey in which she suggests that Britain is lagging far behind the US in terms of extent of and access to digitised image resources.

Comments (1)