Aside from preparing for the onslaught of instruction that will be fall semester, my time lately has been spent exploring topic modeling (I realize that I am somewhat late to the game on this, but it has been on my ‘to do’ list for a while now). After installing MALLET, a java-based natural language processing package that facilitates topic modeling among other things, reading this helpful tutorial, and seeing evidence of topic modeling’s utility for analyzing large volumes of text, I am intrigued but also somewhat overwhelmed. The further I move away from introductory explanations of topic modeling, like David M. Blei’s “Topic Modeling and Digital Humanities”, and the closer I get to comprehensive explanations of how a topic model like latent Dirichlet allocation actually works, the more overwhelmed I become. Compounding this uneasiness is the recognition that topic modeling is predicated on certain assumptions about the way language works that not all linguists share. What is a librarian who is neither quantitatively inclined nor a linguist to do?
Whether or not I know enough about linguistics or statistics to understand the deeper implications of what choosing a method like topic modeling means is beside the point. The fact is, as libraries increasingly become home to large amounts of digitized text and interest in digital projects grows, researchers at our institutions may look to us as potential resources when engaged in large-scale textual analysis. I would argue that it behooves those of us who are willing to familiarize ourselves with some of the approaches, like topic modeling, that are common in digital scholarship.