Voices from the Past: A Digital Exploration of Australian World War I Diaries


DHA 2025: Long Talk


Slides:

Abstract:

Advances in computational analysis for text documents has opened new avenues for the analysis of historic documents. In particular, this work focuses on analysing a collection of 519 Australian World War I diaries held by the State Library of New South Wales. Digital transcripts of these diaries were made available by the library in conjugation with the 100-year anniversary of the war. However, at the time, the library did not anticipate their use for computational analysis methods. As such, this paper will begin by presenting our work on cleaning this data to make it suitable for computational analysis. Specifically, we will focus on the need for consistency and structured metadata. The diaries were linked with service records from the AIF Project, allowing us to gain a full understanding of the men behind the diaries. Statistics regarding our authors will be presented, showing a survivorship bias and a bias towards those who lived in New South Wales. This is unsurprising as there is a higher chance a diary would survive if the author also survived, and the acquiring library is in New South Wales. We will discuss the need for understanding such biases when analysing collections like this one. Finally, based on the clean data, results from various computational analysis techniques will be presented to understand what the diarists wrote about and how they felt about it. These techniques include considering word frequencies, tf-idf, topic modelling, and sentiment analysis.