Part 7 Conclusion

This project set out to answer a simple question - what can be learnt from a corpus of interviews about modern music history using text analysis and mining? - and in the process I was reminded that more often than not the search for answers only leads to more questions. This has been true in my experience as a journalist and it is true of my early days as a Digital Humanist.

In the end though I feel that I have accomplished what I set out to do. There is clearly plenty to be learnt from applying text analysis and mining to modern music history texts: how people talk about music publicly and privately, what vocabulary they use, and what networks might link the things they talk about.

As it regards this project, my main takeaway is that the texts themselves require some more preparatory work to facilitate better analysis and machine learning.

A first step is to add useful metadata to the files such as categories (for example based on genre or location) and the year that the lecture took place. These two things alone would then allow us to look at changes in the lectures over time or the different ways in which artists from specific genres talk about their practice. Such additional metadata is common in text datasets and could easily be added to the existing version of the corpus in an additional text file which can then be read by corpus packages and libraries.

Alongside this, XML could be used to markup the text in more detail, for example with POS to facilitate better lemmatization during the cleaning process or to help with creating network vizualisations of locations and people mentioned in the lectures.

During the process of putting this project together I spent some time looking into Named Entity Recognition, which allows you to extract entities such as locations and people from text. I ran some tests on the corpus using the spacy library in Python, for example counting the instances when a specific location was mentioned, and it’s clear to me that there is potential there for some interesting analysis and the creation of networks based on this information.

As a closing note, I also believe that music journalism has potential as a field of study within the practice of distant reading. While music journalism does not have the same time span as fiction, to reference one of the most popular fields for distant reading, it does have a lot of text to work with. In the past century alone there have been thousands of music magazines and books published and in the past 30 years more digital items than can be counted have been put online. Obviously not all these are available, for various reasons starting with copyright, but in undertaking this project I have been thinking a lot about what it might look like to begin pulling together music journalism texts into various data sets that can be used for analysis. Some of this already exists, for example there is a data set of reviews taken from the popular online magazine Pitchfork25, many US music magazines such as Billboard and Vibe are available on Google Books, and various fanzine archives exist on I’d always wanted the RBMA lecture archive to be an addition to these existing pockets of music journalism history but now I’m also thinking about what it might be like to start bringing all these things together, rather than leaving them separate, as well as how we can use things like Optical Character Recognition to bring into the digital realm important music journalism that still only exists in print.

As Ted Underwood put it in Distant Horizons: Digital Evidence and Literary Change, “there will certainly be cases where quantitative evidence uncovers puzzles that still lack an explanation.”(Underwood 2019) Music journalists and writers often come across such puzzles during their work and spend years trying to figure them out. It’s becoming increasingly clear to me that the ways in which the Digital Humanities help us combine the qualitative with the quantitative can be of help to them too.

Photo by Karel Chladek


Underwood, Ted. 2019. Distant Horizons: Digital Evidence and Literary Change. 1st ed. The University of Chicago Press.