[Owen Stephens provides a progress update on the Spotlight on the Digital work.]
In our investigation of digitised collections on the web as part of the Spotlight on the Digital project we’ve been looking at how both collections and individual items within collections are presented on the web, and how ‘discoverable’ they are using Google.
To start with we took a list of 140 projects, and assessed the following aspects:
1) Did the URLs for the collections still work
2) Did the collection, and individual items in the collection have ‘well formed’ URLs. URLs were ranked Good, OK or Poor based on the recommendations made in three key documents:
Google’s advice to Webmasters
Tim Berners-Lee’s “Cool URIs don’t change document”
The W3C document “The use of Metadata in URIs”
3) Did the page have an appropriate Title (as given in the HTML title tag)
4) Did the collection or item appear on the first page of results given a Google search for its title or name
5) Did the collection or item appear on the first page of results given a Google search for relevant keywords
Out of the 140 projects, so far we have good data for 90 projects (the other 50 were assessed but don’t have complete findings for various reasons). Of these we found 7, or 8%, were no longer accessible at the given URL. Since the oldest of the projects in question were started in 1998, only seeing issues accessing 8% of these seems pretty good.
In fact, as we know some collections were moved rather than entirely shut down, the survival rate for the digitised resources is likely to be higher than the 92% we found (for example “NewsFilm Online” was originally hosted at http://www.nfo.ac.uk. While this URL no longer works, we understand that the content that was hosted there is now available via the Jisc MediaHub)
Something that immediately stands out from this investigation is how much better the collections fare than individual items from the collections, when held up against these measures.
For example while almost 100% of collection home pages had an appropriate page title, less than 50% of items managed this.
Similarly almost 100% of collections were in the top 10 Google results using the collection or project name as a search term. Only about 50% of items appeared on the first page of Google results using the item name or title.
The charts above give an overview of how collections and items performed against these measures, and while we have further to go in terms of collecting and analysing the data, it seems likely we will see this pattern repeat itself.
While none of this seems particularly startling, it does start to suggest that work that helps improve the way specific items are made discoverable on the web would be of particular value.