JDCC09: Robert Miller: A California Digital Library

According to Robert Miller (Director of Books at the Internet Archive), the entire web can be stored in a 4m x 3m x 3m shipping container. Photographic evidence of this phenomenon was just one small part of the wide-ranging and entertaining second plenary session, which looked at what makes a good library, the Internet Archive, and some of its current projects.

Watch the speaker on video.

Robert Miller Director of Books Internet ArchiveHe began his talk with a bit of context: “it’s grim out there”. Public libraries are looking at closing branches, funding is going down, and unemployment has reached 10% in California – all in all, these are challenging times.

And in the middle of all this, comes the question: What makes a good library?

  • clear vision (you need a strong foundation, built on tradition, linked to a knowledge of future direction)
  • experience (should be user-controlled – they need to understand clearly and quickly what resources are available, and get guidance on this, and also share their knowledge with others)
  • environment (whether physical or virtual it needs to be appealing, inviting, welcoming, inspiring and comfy – with possibly a fish tank!)
  • resources (do the collections meet the research needs? Are they broad and deep – or can you collaborate with someone else to achieve this? Do you have multiple users/copies? What do you do about users v space?)
  • staff (are they friendly, life-long listeners, knowledgeable, open)
  • communication (whether internal, peer-to-peer, external, partnerships (these are changing rapidly))
  • sustainable (lots of issues about funding, relevancy to users, choices of services – can we offer everything we did in the past or is there a trade-off (and that may mean a reallocation of resources))

Robert then gave us a quick, whistle-stop tour of the Internet Archive, which is headquartered in San Francisco.

What do they do?

The Internet Archive has three constituencies:

  • consumer – general
  • academic – researchers
  • library partners

It’s a non-profit organisation, with funding of around $10m from foundations, grants, government and libraries who pay for services.

Essentially, things go in, and some things go out. Their areas of work include:

  • TV archive – 20 different tv channels are covered (although it can’t be made public yet because of copyright)
  • audio – 400,000 items on the site in over 100 collections
  • moving images
  • wayback machine – began archiving the internet in 1996
  • texts – they have 1.4 million books online, some donated, some digitised (around 1,000 plus books per day)

How are they working to help libraries?

Robert gave some examples of how the Internet Archive is using relevancy, sustainability, evolution and innovation:

Relevancy and evolution

  • NASA Images (they host and share all of the video images and digital files in the public domain)
  • Biodiversity Heritage Library Project (trying to digitise one web page for every species known to man before they become extinct) – an example of how they track what activity they have (so 750,000 people per day visit the site).
  • Open Content Alliance (which now has over 5,000 members including Lyrasis and CARLI)
  • University of Toronto (they help with storage and hold a duplicate copy of data)
  • Asami collection (examples of early movable type from Korea) – an example of repatriation as this material is stored in San Francisco but can now be sent digitally back to Korea. The server had to be modified so the material could be read from right to left!
  • Genealogy (working with the Mormons on the world’s largest genealogy collection) – the Internet Archive have donated equipment, and Mormon volunteers are digitising the material. An example of a new type of collaboration.
  • Yiddish literature online (they have half (10,000) the published works in Yiddish online)


  • testing tools – the scan on demand project
  • open library – the idea is to have one web page for every book ever published (they have about 1 million online at the moment, which are fully text searchable

Robert also outlined his vision of a conduit where book servers connect readers, lenders and vendors (obviously making sure that authors are paid, and copyright respected…).


The Internet Archive started working with Yahoo, and has also worked with Google and Microsoft – however, only Google is still there in the field of books. Robert pointed out that libaries generally take a view of 100 years, but that’s probably not the case for commercial organisations. This means that new models of sustainability need to be explored – Robert mentioned endowments, grants, sponsors and partners, but the recession if hitting, so new collaborations (for example with the Mormon volunteers) may be the way forward. Another option is maybe a “stimulus package”, as has happened in China and Japan.

Summary: What makes a good library?

Robert summed up his talk with a final word from a Chinese colleague, who suggested the three elements of a good library should be: resource (base) , technical (instrument) and service (goal).

There was just time for a couple of questions:

Q: What are the pros and cons of collaborating with Google?

A: The pros include wonderful tools and services. The cons include trying to take public knowledge and privatise it. Robert thought that Google was near the end of what they’re going to digitise, but made the point that there’s a whole lot more available in public libraries.

Q: Is there an API? And “we will keep your stuff forever” – will you still be here in 500 years?

A: One portion of the Internet Archive is funded by an endowment…and they are already on their third generation of storage equipment.  And APIs are indeed available on the Internet Archive website.

Robert closed with a final offer to everyone: the Internet Archive would be glad to digitise 10 books as a starter.  He’ll give you them back, and give you a URL so you can play around with it, and start to think about the way forward with digitisation.

Leave a Reply

Your email address will not be published. Required fields are marked *