High Volume Digitisation: Issues, Trends & Innovative Robot Tech

The workshop

This free workshop focuses on high-volume digitisation of bound materials e.g. books, manuscripts, newspapers, magazines, registers and ephemera.

Alhambra

This is a growing area of interest for technical and project managers, practitioners and researchers, and useful for anyone interested in what is also known as ‘mass’ or Large Scale Digitisation Initiatives (LSDIs).

Bringing together experts, UK and European case studies, the latest technical developments and good practices, the day is intended to provide an outline awareness of what is involved in planning LSDIs for digital preservation or digital access. This includes LSDI feasibility, logistics, scaling, outsourcing, costing, risks, quality, metadata, OCR and FAQs.

The day will include real-time demos of robotic-arm scanners with opportunities for two way dialogue and open questions.

Location

The New Technology Institute, Birmingham

Date

Wed 23rd September 2009 (all day)

More Information & How to Register

For more details and the workshop programme please click here . Alternatively, email dcs@bcu.ac.uk or tel. (0)121 331 6350 and ask for Bev Dodd or Beth Delwiche for information or to register.

Deadline for registration

Fri 18th September 2009. Attendance is free until the deadline, but places are limited so attendees are urged to reserve places early and avoid possible disappointment !

This workshop is sponsored by JISC as part of the JISC Digitisation programme.

Book scanners: compare and contrast

For those considering large scale book digitisation, and the purchase of a book scanner, this brief report will help consider the pros and cons of some of the main book scanners currently available on the market.

Julian Ball, the author of the report, attended an event at the Munich Digitisation Centre (18-10 June 2008) where four vendors exhibited and demonstarted their scanners: Qidenus, Kirtas, Treventus and 4DigitalBooks.

The report lists basic specifications for each scanner, contact details and personal observations on the various products.

Julian Ball is the Manager of BOPCRIS, the Digitisation Centre based within the Hartley Library at the University of Southampton. He he also currently involved in one of the JISC-funded digitisation projects, 19th Century Pamphlets Online.

Download the report (PDF) on Book scanners Munich 2008.

Podcast Listen to a podcast on the 19th Century Pamphlets digitisation project with Project Manager Grant Young.

The challenges of “useful” OCR

The National Archive’s digitisation project, British Governance in the 20th century – Cabinet Papers, 1914-1975, has been grappling with issues of “useful” OCR. It might be stating the obvious, but accurate OCR is as useful as the search results it produces.

War Cabinet paper

If OCRd text consistently misspells particularly relevant key words for retrieving certain documents, than the search results against these key words will not always bring up appropriate documents, and will lack in accuracy.

For the National Archives, it was not enough to establish a range of acceptable OCR performance levels purely from a quantitative point of view, eg OCR performance accuracy should not be below 88%. This is because if the remaining 12% of text that is not accurate includes particularly relevant key words for retrieving a certain document that users are likely to search by, the discovery of that document is impeded or made less likely. Eg, if the word “submarine” is particularly relevant to the subject of a document, and it’s consistently misspelt by the OCR software, the likelihood of discovering that document is less than if another, less relevant, word, had been misspelled. So, even matching an established minimum percentage of performance (eg 88%), does not necessarily mean that search results will be accurate or useful.

The National Archives are also adopting a more qualitative approach to run alongside the quantitative one described above. They are concentrating on identifying the most relevant and frequently misspelt “key” words across all of the OCRd documents. They are then planning to run a global “search and replace” to reinstate the correctly spelt words.

Although this will have marginal effect on the overall accuracy ratings, this will increase the usefulness of OCR to the end user.

Digitise a book in 15 minutes!

JISC recently met with representatives of QIDENUS TECHNOLOGIES, who are prototyping new robotic book scanning technologies.

QiScan RBSpro scanner

QiScan RBSpro is a fully automated robotic scanner that uses a robotic rubber “finger”, and no suction technologies, to turn the pages of a book. The “finger” senses the type of paper and the machine sets the right angle for handling the paper. The scanner has been successfully tested with 15th and 16th century books.

Key advantages of this new scanner, QUIDENUS say, are more efficiency in the workflow and lower labour costs, as one operator can work on up to five machines at the same time. Capture and post-processing activities, such as OCR, are very speedy and the scanner is said to produce a digitised and searchable book in 15 minutes!

To see for yourself, you can attend the event at the Bayerische Staatsbibliothek München on 18-20 June 2008 , where QUIDENUS will be demonstrating their new products next to their competitors.