Categories
Digitisation conference 2007

Conference 2007: Richard Ovenden, Mike Keller: Research library roles and priorities

conf03.jpgWhat are the roles and priorities for research libraries in the UK and US in the delivery of e-content?

Richard Ovenden, Keeper of Special Collections (Associate Director), Bodleian Library, University of Oxford
Michael Keller, University Librarian, Stanford University

We want to talk about the elephant in the room, the love that dare not speak its name…

RO: Google has been tangentially referenced in the discussion today and thought it might be useful for us to share what we’re allowed to share with you. It’s been an interesting experience in Oxford being part of a growing network of libraries brought together by Google. If I kick off with a few comments about what we’ve been doing.

In Oxford, 19th century articles in the public domain. We’ve taken the ipr aspect very seriously and avoided nay that would cause difficulty. It’s not a vastly complicated project – taking books and digitising them and putting them back on the shelf. We’ve also taken pains to ensure careful handling of the books. Many of them came through the legal deposit privilege so had responsibility to take good care. The project has been industrial scale and that’s the difference for me. This is on a different planetary level from the JISC projects. It’s a huge logistical effort. Orgainsing the move of 100s of 1000s of books from 40 buildings and moving them around Oxford and getting them back on the shelves as quickly as possible. Selection is easy – there is none. Every book we can lay our hands on from the 19th century. Can already see them on the Google Books interface.

Expectations have changed – I was berated last year for the journals which weren’t yet up. We’d only been doing it for four months! Pace and expectations are both incredibly fast. We refer to the project as The Beast. Keeping The Beast fed is what we spend our time doing. One of the unexpected things we’ve learnt is just how little we know about our collections. One of the great lessons is that we need to get back to the shelves and learn about what we’ve got on our shelves and what condition they are in and the actual content itself. We don’t know enough about it. And we’ve discovered pockets of books that were never catalogued. Next phase is integrating that content into new services.

MK: we’ve been sending public domain works to Google and developed the ‘copyright terminator’ – a transcription of all of the copyright records 1923-63, 4-5million copyright records and not all the renewals actually occurred so many of them are actually in the public domain now. We now know a lot more about what is not protected by copyright. Sending 1700 documents a day over to the Google lab, in the process discovered about 8000 books that need conservation treatment.

At Stanford are expectations are, firstly, that it is an indexing project. Then can find the book in a library near you, booksellers, PoD suppliers. Indexing important as leads to an increase in use of the collections. Indexing, searching and other web services are highly valued by 85% of readers of journals.

Secondly, will use as preservation copies the digital surrogates created by Google.

Third, will be indexing these works in new ways. Taxonomic indexing allows indexing not by words but by expressions so can create new linkages and fingerprint of articles for more precise matching of result. Citation linking also incredibly valuable. New kinds of searching – associative searching, a vector expression of any text in any language which allows you to compare that text to pre-computed vector expressions of other texts and so get a very close match. Very brief records can produce amazing matches. Have experiemented with contents of science magazine and results are very good.

Fourth, will be able to use the digital surrogates as the testbed for new research at Stanford. Literary, historical and linguistic investigations. Will use the fairness defence of US copyright law.

RO: agree with all that Mike’s said. A lot of discussion about e-science for the humanities and this project gives us the scope to be serious about doing that. Also has scope for the health sciences with such a huge corpus of material. We chose 19th century because in the public domain but also would extend the EBO and ECHO work and add value to that wotk, continuing the historic corpora of material.

One of the other useful thing is the idea that it is a search project. Another index to the Bodlean library which allows researchers to make more efficient use of their time. For Google, metadata was initially just an inventory but now starting to appreciate the cataloguing information. Now realise that about 18% of our collection isn’t catalogued at all and it’s a time to get back to basics with cataloguing.

MK: I disagree..

RO: But metadata is very useful for all sorts of reasons – need to have that kind of information

MK: not arguing against metadata… the story unfolds… what hasn’t worked very well? we’ve tested several different aggregators and they are limited, So we’ve been trying to think of a better method for cracking lots of codes simultaneously. Decided to start closer to the customer. Trying to devise a simple interface that will start with a limited set of files to interrogate but will eventually lead the reader down several different paths to get several different research results. The Google digi project simply accelerates some things we were trying to do anyway. We are going to be asembling e-objects with the digi surrogates we get back in order to assemble them together. We know e can do it as cost of memory etc is coming down and we thik it’s a matter of doing some smart development. I think the course of the next 10 years will move us dramatically forward with the amount of content that can be indxed and in the quality of the indexing.

RO: challenges…in the broader realm of e-content there is the issue of cultural heritage and born-digital (or ‘only-digital’). The whole rights landcape will change in ways we can’t predict and tension between open and restricted will be critical and I think the fragility of the skills base will be in danger of being exposed as we do more digitisation. We need to develop the expertise to make use of this material in more challenging ways but also understand what the original artefact means so the digital representation is as meangful as it can be.

MK: sense of a large complicated, deep matrix of architectures and systems both locally and distant. Google books will eventually spit out stable urls for the books which means that can do things with those stable urls and go easily from a reference to the book. Incredibly important. See the poss more and more of that happening. The interplay of local architecture and systems with distant services will become more and more interesting and models for future behaviours. What a digital repository now today will be different in five years and incredibly different in 10 years because of improvements in memory devices.

RO: Loughborough report – part of which generated the argument behind Phase two programme. Report called for a UK framework for digitisation. Points from it still need to be addressed and taken forward.

Leave a Reply

Your email address will not be published. Required fields are marked *