Update on JISC Content programme

Projects in the JISC Content programme are now about 6 months into their development. After an initial settling down stage and bringing teams and documentation together, they’ve been getting their teeth into the nitty gritty of the work.

Drawing on projects’ blogs and their own reflections on recent activity, which can be accessed on the programme’s Netvibes pages, these slides highlight the main areas of work teams have been involved in.



Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Help improve online search for humanities resources

The University of Sheffield is undertaking research with the intention of improving search within the humanities. The AHRC-funded project called ‘Participating in Search Design: A Study of George Thomason’s English Newsbooks‘ is a collaboration between the Humanities Research Institute and the departments of History (Professor Mike Braddick, Pro-Vice Chancellor for Arts and Humanities), English (Dr Marcus Nevitt) and Sociological Studies (Dr Bridgette Wessels).

We are seeking participants ranging from PhD students to Professors in the research areas of History, English Language, English Literature, Politics and Journalism to answer a short survey about your current research practice, including your overall understanding of search and the advantages and drawbacks of web-based vs. more traditional text-based methods.

If this applies to you please follow the link to the survey below. It will only take 10 minutes of your time and your opinions will be of great value to our work and the wider impact of the project.

https://www.surveymonkey.com/s/researchpractice

The knowledge gained from this will be used to inform the design of better search interfaces for online resources, which genuinely meet the needs of the research community. The test dataset is approximately 50,000 pages of 17th century newsbooks collected by George Thomason.

The survey will be open until Monday 16th April. Results will then be fed back to survey respondents and there will also be the opportunity for further participation in the project. The survey is completely anonymous and complies with the University of Sheffield’s Ethics Policy.

Please contact Keira Borrill for further information about the project or follow us on Twitter and visit
the project blog.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Vote for the Great War Community Collections

Great War Collections (which now includes the JISC-funded Great War Archive and Europeana 1914-1918) has been entered for the EngageU Award, a European Competition for Best Innovations in University Outreach and Public Engagement.

The public can vote for this project until 19 April.

The Great War Collections started as a JISC-funded project, the Great War Archive, from the University of Oxford, back in 2008, as the sister site to the First World War Poetry Digital Archive.

Since then it expanded to the rest of Europe and has engaged over 2100 members of the public in the UK, Germany and Luxembourg to capture over 30,000 images of personally-owned memorabilia from the First World War.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Digital Humanities Congress 2012 – Call for papers

The Digital Humanities Congress is a new conference which will be held in Sheffield every two years. Its purpose is to promote the sharing of knowledge, ideas and techniques within the digital humanities.

Digital humanities is understood by Sheffield to mean the use of technology within arts, heritage and humanities research as both a method of inquiry and a means of dissemination.

Proposals on all aspects of digital humanities are welcome, and the deadline is 30 April 2012.

For more information see the conference website.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

How JISC Content projects are tackling Web usability

User engagement, both online and offline, is an important process for any project delivering successful web-based resources.

Web usability is arguably part of that process, and we were keen to expose projects within the current JISC Content programme to the principles of user centred design early on in the planning of their online resources.

Many projects recently attended a Web usability workshop run by Stuart Church, from pureusability, and blogged about the range of activities they’ve set up for defining their key users, gathering information on users’ needs, and seeking users’ input into the initial mock ups of their web sites.

All projects’ blogs can be found on the JISC Content porgramme’s Netvibes web pages.

My top picks are:

- ENGrich (clustering visual resources on engeneering) posted some initial wireframes displaying not only how their search results pages might look like but also how they might make use of paradata (data from users on how resources are being used) in order to provide added value and context to the resources surfaced through the search.

ENGrich-UI-wireframes-paradata

- Before attempting wireframes, the OVAM (Online Veterinary Anatomical Museum)
project used a visual mind map to represent how the resources aggregated by the virtual museum might be arranged. One can see different routes into the content according to type of user, media type of resource, species and systems.

- In the post First designs and focus group, Manuscripts Online
gives a sneak preview of different home page layouts the team showed to a focus group of users. There’s also a link to documentation on Website design and user testing that might be useful to other projects as well about web design brief, Web2 functionalities and focus group report.

manuscriptsonline_home_v1_02

- OBL4HE (Object Based Learning for Higher Education) has conducted some preliminary research on students’ use of online resources and has published their report here – and found that above all online resources need to be

clearly relevant to students’ course tasks and fulfil a need not already met offline

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Feedback on OCRing tabular data

Last week, we asked for feedback on your experience of OCRing tabular data.

Christy Henshaw, working on the JISC funded digitisation of the Medical Officers of Health reports, has summarised the responses received so far:

I recently posted a request to digital library mailing lists, asking the community to share their experiences and knowledge about encoding historical tabular data. It was also posted on this blog. Since then, I have received many useful emails, documents and links from others with experience in encoding tabular data from historic documents. A big thank you to everyone who got in touch. Here is a summary of what I’ve learned from these responses.

The original records. You need to know your content in order to make sense of it in a digital format as layout is all-important. Tables come in many configurations, and their layout must be assessed before digitisaton to make sure the data are transferred correctly.

For example, you may need to split single columns with two sets of data in each cell into two columns, or include dashes that were printed in the original to indicate missing information. If you know your tables, you can identify natural checksums that allow you to test the data add up correctly, or make sense when loaded into statistical software such as SPSS. If possible, it is useful to make a note of printer errors (although I imagine that will come to light during QA – when something looks incorrect in the digitised version, but turns out to have been wrong in the original). I imagine you could at least try get a sense of how good the editing was and whether to anticipate errors.

OCR v. rekeying. Optical character recognition (OCR) which programmatically decodes text from images rarely – if ever – works for tabular data. My impression is that OCR engines may accurately pick up the words and numbers in the table, but are not configured to reproduce the layouts. As already stated, layout is key! Therefore, rekeying is almost universally done. Some have OCRed first, and then corrected the tables by hand, but in most cases the advice is to not waste time OCRing first. In our case, if we decide to rekey every table in the reports (it can range from 0 to 150 tables in any single report), we probably might as well rekey the whole report and dispense with OCR completely.

Output formats. During rekeying the text can be marked up in XML or HTML, flexible data formats that can be made available as is and/or converted to other formats. Tables in these documents can be marked up, and I assume extracted for reuse as raw data.

For a good example of what historical tables can look like in HTML see Statistics New Zealand’s digitised year books. HTML table mark-up isn’t complicated; see www.w3.org/wiki/HTML_tables (as long as you set the formatting rules for translating a printed table into an electronic one).

Searching the data. Searching within tables is useful if the tables are very large. Users can then drill down to the specific areas of the table they are interested in. I can see this could be useful for our reports, where we have tables showing instances of notifiable diseases across different sectors of the population, for example. See www.tandf.co.uk/journals/titles/01615440.asp for a paper on metadata for a statistical database (we don’t subscribe to it so I haven’t been able to read it).

We could look into constructing queries based on the full-text data in our Library Catalogue that somehow merges the word search with a structural search. This search would of course result in a list of catalogue records in the normal way and you would have to delve into each individual record to get to the data. A dedicated database would allow much greater access, and is something to consider.

There may be other ideas or opinions out there, or I may have misunderstood something – please feel free to comment on this blog post!

Thank you, Christy.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Do you have experience in OCRing tabular data?

If you have experience in dealing with OCR and tabular data, one of the current JISC-funded mass digitisation projects, the Medical Officer of Health reports, led by the Wellcome Library, would like to hear from you.

Christy Henshaw, from the Wellcome Digital Library:

For our Medical Officer of Health project, we will be digitising health reports that contain a lot of information in tables (as well as charts and graphs). We plan to OCR the reports for full-text indexing, but realise that OCR’ing tabular data isn’t going to be easy, and that double- or triple-rekeying may be necessary.

I would love to hear from anyone who has had any experience with OCR’ing or rekeying tabular data (tables with both text and numbers, including merged cells both horizontally and vertically, text printed on a vertical plane, etc.).

Not only do we plan to get the tabular data into a state that can be searched (the text elements, at least), but to provide the data as CSV or Excel for downloading (as well as visible on the page images themselves). If anyone has ever provided such data from digitised content before, I’d be really interested to hear about your experiences on that too.

Many thanks!

You can post comments to this blog or contact Christy, c.henshaw AT wellcome.ac.uk, and we will then summarise them in a new post.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Linked Open Data: what is it? And why is it good for you?

An excellent short video from Europeana on what linked open data is and why it is a good thing both for users and content providers.

For more information on Europeana’s work on open data see their press release of 17 February.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

JISC vacancy: Programme Manager Digitisation

JISC is advertising for a post of Programme Manager – e-Content: Digitisation.

The post holder will be responsible for managing a range of different types of e-content projects taking place in universities and other organisations that are either digitising content or developing existing digital collections to make them more accessible, relevant and discoverable on the web.

The approach to programme management is very much about inspiring excellence through innovative practices and helping projects to build networks and share knowledge.

The JISC e-Content team also manages the Strategic Content Alliance (SCA) which aims to enable UK public sector bodies and other key organisations to collaborate and coordinate their online activities to make the best use of funds available and to explore common services, infrastructure and approaches.

The post is based at King’s College London.

For informal enquires please contact:
Catherine Grout, telephone 0203 006 6058 or 07958 996 647, email: c.grout@jisc.ac.uk.

Closing date: 28 February

Further details on how to apply can be found here: http://www.kcl.ac.uk/depsta/pertra/vacancy/external/pers_detail.php?jobindex=11296

JISC also has two further vacancies for:

Programme Manager – Flexible Service Delivery
More information here

Programme Manager – Digital Infrastructure Technical Directions
More information here

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

New JISC funding opportunties

JISC has issued 5 Invitations to Tender for Digital Infrastructure Reports.

Bidders must have knowledge of the topic area and must be experienced in producing high quality reports. These are open Invitations to Tender and anyone may bid.

5 Invitations to Tender for Reports on:

Advantages of APIs
• Embedded Licences: What, Why and How
• Activity Data: Analytics and Metrics
• The Open Landscape
• Access to citation data: a cost-benefit and risk review and forward look

For more information please see: http://infteam.jiscinvolve.org/wp/2012/02/09/didreports/

Closing date for bids is 9th March 2012.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

← Previous PageNext Page →