Machine ready collections and fighting the AI hype


Recent podcast draws the crowds

Last Wednesday we were pleased to be joined by 180 people for our Making Your Collections AI Ready webinar. The focus of the session was on how academic libraries can get their collections online in forms consumable by people and machines. We heard from Ines Byrne of the National Library of Scotland. Ines’s key point was that we can start simple. We don’t need to produce huge data sets because small datasets can also be valuable. She also spoke about the dangers of perfectionism; datasets don’t have to be perfect to be useful.

Jodie Double of the University of Leeds library spoke about the journey towards publishing collections in the form of data is about understanding the risks, being brave and experimental and that it must be a collective effort.

I will be writing a full report on the session, including the recording, transcript and slides, and will try to answer some of the many questions asked by the audience. So look out for that on this blog in a few weeks’ time.

Knowing Machines

In the meantime, I thought it useful to point to the work being conducted by Knowing Machines, quote: “a research project tracing the histories, practices, and politics of how machine learning systems are trained to interpret the world”,which is sponsored by the Alfred P. Sloan Foundation. A crack team, led by PI Kate Crawford, has been assembled focusing on, quote: “developing critical methodologies and tools for understanding, analyzing, and investigating training datasets, and studying their role in the construction of “ground truth” for machine learning.”

I want to draw particular attention to a publication on the site called A Critical Field Guide For Working With Machine Learning Datasets*. The site says about the guide, quote: “This guide offers questions, suggestions, strategies, and resources to help people work with existing machine learning datasets at every phase of their life-cycle.” It is designed for everyone from the curious to the well versed.

The site is also very informative for debunking AI hype and there is a podcast as well which provides insights into how datasets are formed and their impact on the outputs of AI.

* S. Ciston, “A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS,” K. Crawford and M. Ananny, Eds., Knowing Machines project, Feb. 2023.

Fighting the hype

On the theme of useful information for debunking AI hype the other light hearted yet serious and satirical podcast to explore is Mystery AI Hype Theater 3000. A recent episode should be interesting to archivists. It examines the grant proposal document, made in the 50s, which led to the Dartmouth meeting, the first meeting about AI with Marvin Minsky, Nathaniel Rochester, and Claude Shannon. The meeting started the tradition of hype around intelligent machines. The podcast presenters, linguist Emily M. Bender and sociologist Alex Hanna, suggest that we still don’t have an agreed definition of intelligence, so what are we doing claiming machines are intelligent or more intelligent than humans?

The DAIR institute is quote: “an interdisciplinary and globally distributed AI research institute rooted in the belief that AI is not inevitable, its harms are preventable, and when its production and deployment include diverse perspectives and deliberate processes it can be beneficial.”

A year-long study in the use of AI by US universities

It is also worth keeping and eye out for some reports coming out of a project established by the US based Ithaka S+R. They are doing a study into Generative AI in Higher Education and are working with 20 US universities to undertake a year-long project which will result in three reports. You can read more here Making AI Generative for Higher Education – Ithaka S+R

This post forms part of a series on Artificial Intelligence and the things we can do to be more aware of its underlying technologies.

By Peter Findlay

Subject Matter Expert, Digital Scholarship, Content and Discovery, Jisc

Working with Jisc's Higher Education members to improve access to to their special collections in the age of data-centric arts, humanities and social science research.

I am a site admin for this website.

Leave a Reply

Your email address will not be published. Required fields are marked *