The eight projects from the international Digging into Data programme presented their project findings at a conference in Washington DC in June 2011. Much was discussed; here are the pertinent topics from a JISC point of view.
- Visualisation of data was a crucial medium. If you have results from over 190,000 trials (as the Data Mining with Criminal Intent project does), the analysis will not make sense until the results are plotted in a meaningful way, with blips and rises and falls in graphs indicating new areas for academic exploration.
Yet, many speakers had a methodological anxiety about visualisation. Did sampling and framing graphs lead to errors in causation? How was absent or flawed evidence represented? Visualisation tools need not only to be subtle and engaging, but they had to wear their methodologies on their sleeves.
- The Digging into Data programme was framed for innovative projects with a focus on research outcomes. But you could see principal investigators itching to create more sustainable services that would allow their raw data and their ingenious visualisation tools to be used by a wider community of users.
The Mapping the Republic of Letters site, allowing users to trace the movement of intellectual thought in the eighteenth century is not only intriguing academically but is visually fascinating.
But it’s complex technology, which will shift as new content and new techniques arrive. How do such services get sustained for broader use?
- The Digging into Image Data (looking at authorship issues in medieval manuscripts, maps and quilts) had to draw on numerous types of expertise. Not just in those with different discipline strengths in the humanities (medievalists, art historians, historical geographers) but in different faculties (computer science and social sciences, plus developers)
However, one stakeholder was missing; very few libraries were cited during the conference, save as a source for original material. Do they really have no role to play in future digital research? Perhaps libraries can provide some vital support for the sustainability? Or for the difficult work of drawing together many sources, and normalising the metadata so they can be cross-searched?
- Many at the conference voiced the difficulties in convincing ‘traditional humanities scholars’ of the worth of exploring big data. The respondent to the Mining a Year of Speech project demonstrated that the ‘Digging’ community is involved in the big, human questions that are the staple of all humanities research.
For the linguist, harvesting the Internet can provide them with an enriched source of data for questions linguists may have been tackling for years – “do women talk quicker than men?”, “what makes someone sound unfriendly?”, “how does a student speak when he is uncertain?” or even “how does language change when people flirt with one another?”
If a methodological rift does emerge within the humanities, it is by returning to the shared questions that the rift can be bridged.