The first talk in the Managing Content strand investigated the real costs of digitisation projects and the hidden costs they often contain, as well as effective ways for publishers and private bodies (generally publishers) to work together to create digital resources. Moderated by Grant Young, a Digitisation and Digital Preservation Specialist at Cambridge University Library, the session included talks by David Tomkins, from the Bodleian Library, University of Oxford; Peter White, from ProQuest; Bill Pidduck, the Chairman ofAdam Matthews Publications; and Simon Tanner from King’s Digital Consultancy Services, King’s College London.
A full house, thankful for the functioning air-conditioning, received, as Grant Young joked, four talks for the price of three since David Tomkins and Peter White were doubling up in the third time slot to speak about their fascinating work on the digitisation of the John Johnson ephemera collection.
First though, Simon Tanner (freshly returned from Stockholm where he had been giving a Nobel lecture) gave the benefit of his experience as a trouble-shooter to provide a realistic outline of the kind of costs – and particularly hidden costs that digitisation projects might encounter.
His intention, he stated was to be “informative and contentious” , functions he provided admirably.
Areas where he suggested hidden costs might be encountered were:
Time and planning
- If the allocation of time against tasks hasn’t been measured accurately enough, costs will mount.
- It’s very difficult to estimate how much time and effort needed for transmission of meta-data. And easy to overestimate how much of work technology can do.
- Although character recognition works well – you mustn’t assume it solves all problems, and you must factor those potential pitfalls into your pre-plans.
- How do you alocate staff time? Be careful about the assumptions you make.
IPR and agreements
This he noted, was, “pretty damn obvious” – but he wasn’t pretending to be “anything but obvious”. It’s worth repeating the problems that can arise – especially when you only have 6-8 weeks to respond to a funding call and it becomes all too easy to assume that memorandums of understanding you have for your pitch are agreements… But they aren’t! It, noted Simon, takes time to get agreements. He also noted that this also often becomes a hidden cost “because we don’t like to admit it is a factor” or don’t realise how much it will cost…
You need to avoid having staff sitting around… Both because they’re present when they aren’t required or because it takes time getting the material they need to them. This becomes more difficult the more collaborative partners you have.
There are many things you are expected to do, but no one wants to fund you for. For instance, legal deposits in libraries, or the need to keep things available permanently. An academic friend of Simon’s has a t-shirt bearing the slogan “perpetuity is a very long time”.
Often we have funding to make stuff – but not funding to keep it available. While open access is undeniably useful, it becomes very problematic in terms of funding.
“So do you criticise people for charging?” asked Simon. “How do they pay for it? Is there a better way for us to support those who would like to be free, but have to charge to fund themselves?”
Foundations who hand out money for content creation tend to leave projects high and dry after funding runs out. So how do the projects keep going? They rarely generate enough money. They all require some level of institutional support. “I think it’s safe to say again that institutions are good,” said Simon, pointing out that now that the US has become “a socialist nation”, it’s no longer heresy to suggest that we can’t just rely on the market.
He followed these cheeky remarks with a closing thought: ” Is the value in the glass, the wine or the drinking?”
Representing Adam Matthew Publications, Bill Puddock spoke about how external publishers can help libraries provide value for money.
He noted that online special collections can help libraries:
- To preserve the collections they hold.
- To enhance access to resources for existing users.
- To bring special collections within the library catalogue.
- To build up reservoir of images on campus.
- To broaden access beyond the campus.
A good example, he said would be the British Cartoon Archive 130,000 cartoons – clearly a wonderful resource for students increasingly interested in visual culture.
But great as such collections are, they’re also expensive since libraries will face many costs relating to:
- The IT infrastructure necessary to support the project.
- The time needed to create a wireframe and design the functionality of the site at the beginning.
- Image capture.
- Cost of OCR and double keying.
- Tagging images and creating metadata is super tiem-consuming
- Front end design.
- Hosting the site and maintaining the platform.
Yet, said Bill, when asked if digitisation can continue in the current climate he would give a resounding ‘yes’. Even though grants are drying up and government funding will be difficult there is plenty of scope for more co-operative parnterships between private publishers and libraries.
“Adam Matthew have 25 major projects available out there”, he said, before moving on to detailed analysis of two of these.
The Mass Observation project
The inter-war period was a time when the middle and upper classes knew less about the working majority of the population than perhaps at any other time in history. Tom Harrison (the eccentric anthropologist progenitor of Mass Observation) explained that he knew more about “the Cannibals of the New Hebrides than the working populations of the North of England.” So together with a surrealist poet and documentary film-maker he decided to document the lives of ordinary British people. The result was a massive resource, now curated by the University of Sussex, containing hundreds of file reports from professional observers, diaries kept by citizen volunteers (including among other things some fascinating accounts of life in the Blitz), photographs, films… Rooms and rooms full of material that used to be accessible only in Sussex and used to be extremely difficult to hunt through.
Digitisation has revolutionised access to this material, making it easy to search through and more widely available. It’s beneficial to the library because:
- They now have a huge section of archive digitzed at no cost to university – even though they now own and can use these images.
- Fragile material can now be withdrawn from use.
- Material is accessible across campus.
- It’s more searchable and easy to use.
- Scholars wishing to consult the archive can be told of institutions that hold it.
- Material now used by scholars around the world.
- The university gets royalties from it.
His second example concerned the Perdita project, containing collected writings of women pre-1700, again “a great way to aggregate material that isn’t available otherwise” and again represents a mutually beneficial to Adam Matthew and the institutions involved. AM because they get to sell on the resource, universities for the reasons outlined above.
In conclusion Bill said that AM bring to the table:
- An IT infrastructure.
- The ability to create a bespoke solution for each archive.
- The ability to work with leading partners to capture images.
- Access to COR and dobule keying.
- Front end design.
- Hosting and platform maintenance.
An advertising pitch maybe, but one that can provide real benefits.
David Tomkins and Peter White
David and Peter built on Bill’s presentation by talking of their own experience working as a public private partnership on the digitisation of the John Johnson collection.
The John Johnson collection of ephemera is an amazing resource (detailed here – scroll down!) containing more than 65,000 items of ‘ephemera’ (everyday items designed to be cast aside) requiring in excess of 170,000 digital images.
David Tomkins started off by explaining the benefits of digitisation (there are serious issues of preservation relating to people thumbing through ephemara and of access with regard to seeing the physical collection in the Bodleian), before briefly explaining why the library opted to use the services of Proquest:
- They provide experience.
- They assume the cost of creating and sustaining an online resouces with search facilities.
- They have expertise in rights management and clearance.
- They provide tech support – and response to user queries.
- Economies of scale.
- They can provide access to the resource in the UK (click here!)
Peter White then demonstrated the efficiency of the search function on the archive by keying in “value for money”. This brought up dozens of results that would have taken hours and hours to root out in the physical collection. He then focussed in on one of these, an advert for a £1 hat, and showed the wealth of information provided to contextualise this image.
The basic business model, Peter said, is that Proquest sells the resource to institutions outside the UK, which pays a royalty to the Bodleian and helps fund it in UK for free.
He noted that he’s aware that this “boutique digitisation” with such astonishing search facilities seems like a luxury in the current climate, but it’s also essential for projects like this to work and to provide long term value for money. Because:
- Non standard material requires expert description to make it available. It’s worth doing properly.
- The more sophisticated your cataloguing is the better the search interface has to be.
- Rare and fragile materials require conservation.
- You only get one shot at this huge amount of work… You have to make sure it will last and that it’s “future proof”.
- There has to be a structure in place to overcome the challenges associated with identification and clearance of copyrights.
“To reconnect people to their heritage, you need to do this kind of work,” he concluded.
David Tomkins picked up the baton again to talk about the downside and potential pitfalls of such partnernships. “Low sales is a potential problem,” he noted. They won’t know if the archive is a success for a long time (it isn’t even fully live yet) and it’s a bit of a gamble as to whether it will generate sufficient money to sustain itself.
He also noted there’s a problem of geographical exclusivity built into the business model that can breed resentment in those that can’t access the material and make overseas collaboration tricky. And since the project is not Open Access they miss out on potential funding opportunities.
Further challenges they have encountered include:
- Specified timescale. It’s a big project with a quick turnaround – and like all such, it’s bound to run late.
- Hidden staff costs (including reduncancy and pension contributions for staff whom the project absorbed from elsewhere in the institution)
- Cataloguing is expensive.
- Digitisation costs: “We don’t know how many scans there will be until the very end of the project”.
- Capital claims relating to hardware and software: ” You need to guess what you’ll need a year or so down the line.”
David then laughingly noted that there was no time left to discuss Proquest’s problems and a series of interesting questions were taken from the floor relating to the problems surrounding the long term sustainablity of bespoke systems and there complex, individuated search engines. “Sustainability depends on marketability,” said Peter White. “It’s not an option to have simpler interface for this project.”
And on that strident note, the audience filed out, with plenty to chew over during the tea-break.