What Can Open Data Mean for the Humanities?: A Quick Consideration of Digitized Primary Sources

For my contribution to blog posts for Open Access Week 2016, I wanted to write down some ideas I have had floating in my head in an attempt to better articulate them. 

In the course of my career in Digital Humanities, one of the largest stumbling blocks to using digital primary source material is finding those materials already processed into plain text or encoded in the XML standard, TEI. Digitizing libraries' special collections is only the first step in making materials ready for digital research methodologies. Materials that are handwritten, like manuscripts and correspondences, are often very difficult to impossible to process using optical character recognition software (OCR). OCR is designed to process printed materials. If the digitized primary sources are printed, like high-res digital scans of a printed book, then you are set! There are several options to process printed materials, including free and open source options. However, if those materials are not printed, what is a researcher or project team to do? The only option is to transcribe (or find a researcher who has already transcribed the source material and willing to share). Transcription requires a lot of time, energy, and money. In my experience, it has become a major obstacle to potential digital research projects. So much so that I believe that if a humanities project transcribes or accesses transcription, whether it is digital or not, as part of its publication, it should endeavor to make those transcriptions openly accessible. 

As the sciences have done, these transcriptions should be seen as data and that data should be open. My colleague, Heather Coates, often teaches and writes about the importance of open data. Specifically, the digital humanities, a field that has placed a commitment on collaboration, sharing, and open access, should continue expectations and increase pressure that these transcriptions, as data, should be shared as part of the project's deliverables and processes. I think one of the easiest ways to accomplish this for granting agencies, including internal university grants, to require or at least encourage successful grant applicants to share their data. Sharing these transcription would mean that other researchers would have the opportunity to also work with these materials using digital research methods, exploring new and inventive projects. In short, making transcriptions of handwritten material openly accessible, digital researchers can help the digital humanities flourish and grow while also embracing those core values of collaboration as the field expands.

Blog Categories: 
Digital Scholarship Blogs
Open Access
Digital Humanities
Updated Oct 26, 2016 by Digital Humanities Librarian