#UKMHL Collaboration OCR UK Medical Heritage Library workshops

GW4 Archives: exploring UK Medical Heritage Library and Historical Texts as data

In recent years hack-days have been all the rage and have proved a good vehicle for interactions between people who normally might not work together. In academia there has been a trend towards running so-called ‘labs’. The word implies experimentation; hack-day tends to imply coding (it can be experimental!), whereas ‘lab’ suggests that it can be about experimental thinking, without necessarily needing to lead to the production of code. Code can still be an output of course, but that is not the main point of running a lab. It’s much more about the ideas.

Under the banner of UK Medical Heritage Library we at Jisc have been undertaking some events we call ‘Live Labs’ during which we work with academics and student to explore how we deliver the thousands of 19C medical texts which are located on our Historical Texts website. The labs set out to challenge the idea that these texts are just a bunch of digitised old books. They form part of the web, and, even if they might not be currently represented in the form of linked data, they are linked through their metadata to texts in other places on the network and through the access and use of them for academic discourse they become apparent to new audiences.

The events also seek to identify new ways in which people want to interact with the content in its various manifestations: individual item, corpus, aggregation, links in a web of related data, the metadata itself, as images and so forth. Participants also want to investigate the effectiveness of various web interface, discovery tools and the data itself and the relationship between the book as physical object and its electronic manifestation.

The labs are not an end in themselves and we hope to develop some case studies which draw on the initial insights the labs provide. The case studies will ultimately act as way-pointers for the onward development of our Historical Texts service but should also inform the wider dialogue about the usefulness of electronic archives.

Recently, I was pleased to be able to work with the GW4 Archives team to put together a great program of activities focused on the content of our Historical Texts service including the UK Medical Heritage Library content. The GW4 Archives team decided to call their event a hack-day which meant that people probably expected to be getting their hands dirty with code. In the end no one wrote any, but they did have a deep dive into the fabric of the Historical Texts site and were also able to explore the possibilities of taking content and working with it by using freely available tools and Wikimedia platforms such as Wikidata, WikeSource and Wikipedia.  Colleagues from Cardiff University, the University of Bath and the University of Bristol convened in the John Percival Building at Cardiff University for a jam packed day.

We had run a Live Lab at the launch of UK Medical Heritage Library with Owen Stephens and we were pleased that Owen was again able to lead a session in Cardiff. After some contextual comments from Anthony Mandel, a brief presentation from Keir Waddington (a member of the UKMHL Advisory Group), some remarks form me and an introduction from Leah Tether, Owen was able to demonstrate the possibilities offered by the Historical Texts collections: Eebo, Ecco, British Library 19C Books and UKMHL. Owen unpacked a range of possibilities for investigating the content as pure data, as images and as a cross searchable resources and explored the various interfaces which provide access to these various content types. Participants came up with ideas for ways of improving the search, suggestions for making better use of the UKMHL visualisation tools. They also identified content they might be able to explore in their own research. We gained lots of feedback on how the content is delivered and now understand more about the ways in which researchers think about these archives as ways to identify content for their research, as a means of eliminating items from that research but also as a means of locating the physical artefact and supporting decisions about visiting a particular library to see something in the flesh. Some felt they would like to use the Historical Texts API but that it would need more documentation or training to allow them to do that. Some felt they would like to develop more skills in using these technologies and also perhaps start doing their own coding.

Digital skills development was a major concern of many of those taking part. Overall people were excited by the extent of the UKMHL corpus and suggested that it should be the starting point for anyone wanting to explore 19C advances in medicine. We also looked at tools which already exist in the environment such as Voyant Tools, Open Refine, the Programming Historian and Library Carpentry to enable people to work with Historical Texts content. This strand of Owens session brought us nicely to another set of openly available resources, those provided by the Wikimedia Foundation.

We were fortunate that Martin Poulter was available to take people through the use of platforms provided by Wikimedia. Martin is currently working with the Oxford University after a stint as the Bodleian Wikimedian in residence. He was previously Jisc’s Wikimedian (see our guide: Crowdsourcing – the wiki way of working) and he focuses on the use of these tools to support the dissemination of academic knowledge. The idea is to encourage academics to get involved in contributing to amazing free resources such WIKIDATA and Wikisource. The benefit being that the general public can become aware of new knowledge which has emerged though academic research.

After and introduction from Jenny Kidd, Martin asked participants to work on a text which had been imported from the UKMHL corpus into WikiSource as raw OCR (Optical Character Recognition) text, each person being allocated a page to edit. By the end of the session, we had a corrected publication sitting on WikiSource. Owen had already looked in some detail at issues around dirty OCR on UKMHL and the ability to create a clean version of a text on WikiSource revealed the power of this freely available technology.

Martin was also able to demonstrate how original research, though not directly present on Wikimedia platforms, can be shared through making reference to papers in article texts, by creating stand-alone wikies and links made in Wikipedia entries to external resource where these are appropriate. He also showed how a historic document can reach a wider audience through being represented on crowdsourcing platforms.

Once a paper has been published and reference in other academic works there is no reason why it can’t be added to Wikipedia as an article as long as the article follows Wikipedia conventions. Plos Computational Biology journal is a good example of how this can work. When articles are published in the journal, a matching article of record is produced and which follows the Wikipedia style and these are uploaded onto Wikipedia to fill gaps in its Computational Biology section. They are then available for editing in the normal wiki way. Martin also highlighted an initiatives which allow digitised material to be added by cultural organisation to Wikicommons and WikiSource. We had a look at WIKIDATA and explored how it is creating a web of data by enabling links to open data sets. Wikicommons the repository of reusable media files, should also form a valuable source of photographs, maps and audio clips for academics and student. We created our own timelines in Histropedia and tried out the new referencing tool in Wikipedia. It will make referencing much simpler in future. So, we came away with a clear idea of what we could do with UKMHL texts using these tools. Martin kindly did a Storify of the day.

At the end there was a feedback session during which participants discussed the usefulness of the structured data on WIKIDATA, the tools available on Historical texts, the need to think about ethics in relation to contributors and readers in a shared knowledge environment and how interacting with these kinds of collections, for example visually, is changing the nature of scholarship.

Such events bring our attention to the increasing need for skills in managing humanities research and for those skills necessary to the effective deliver of teaching. If we are to level the playing field between STEM and humanities subjects we need to recognise that effective delivery of the humanities can and should lead to students leaving institutions with a high level of digital skills and by implication those teaching humanities subjects need to be supported by infrastructures that enables the development of their own skills and maximises their ability to teach well in an electronic environment populated by archives, libraries and tools.

By Peter Findlay

Subject Matter Expert, Digital Scholarship, Content and Discovery, Jisc

Working with Jisc's Higher Education members to improve access to to their special collections in the age of data-centric arts, humanities and social science research.

I am a site admin for this website.

Leave a Reply

Your email address will not be published. Required fields are marked *