Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Categories
OCR Searching Users

Creating Keywords Automatically

There’s an awful lot of interesting ideas to unpack in the Nineteenth-Century Serials Edition (ncse) resource mentioned in a previous posting.

For a start, there is novel to addition to showing results by showing the image reproduction for a search results as well as the OCR’d transcription.

ncse1.jpg

There’s the whole range of partners involved in such a website, indicative of who needs to be involved to run an ambitious digitisation project.

ncse2.jpg

And the related conference brought up a whole host of intellectual questions related to integrating the work into scholarly research.

But what is most interesting is the project’s attempts to automatically give subject keywords to articles within their resource by using natural language processing.

ncse3.jpg

Each article in their website has been processed in two ways. Firstly, to extract persons, places, institutions from the complete data; and secondly to create subject terms (e.g. Arts & Crafts or Emotional Actions, States & Processes ) which relate to each of the digitised articles in the collection.

This is handy for users because it bypasses the tyranny of having to use precise search terms to discover particular articles; and it’s useful for the digitiser because they do not have to go through each article individually and make manual decisions about the subject there within.

I’m not sure it completely works as yet (there are some faults in the results and the interface is not intuitive), but this is a brave and valuable step in trying to really exploit the richness of digitised resources, a richness we have not really tapped into yet.

One reply on “Creating Keywords Automatically”

Leave a Reply

Your email address will not be published. Required fields are marked *