What Do You Do With 375.000 Digitised Norwegian Books?

On 24. and 25. February, the Digital Humanities Forum at the University of Oslo hosted two half-day seminars focussing especially on digital textual studies. The first instance was a joint seminar with the newly established Digital Humanities Center at Gothenburg University and the Digital Humanities Lab Denmark. Gathered under the topic “litteraturforskning og digitale verktøy” (literary studies and digital tools), Jon Haarberg (University of Oslo), Jenny Bergenmar, Mats Malm and Sverker Lundin (Gothenburg University) shared their experiences with digitisation, digital editing, electronic literature and textual analysis. Among the presented projects were the digital edition of Petter Dass’ catechism songs, Språkbanken and Litteraturbanken (Swedish), the Women Writers Network and poeter.se, the largest Swedish online platform and archive for modern poetry and writing. Bergenmar and Malm also presented the new DH center at Gothenburg University and their future plans for a master programme in DH. The Swedes startet a seminar series on DH in the fall semester 2014 that will continue in 2015.

The second half-day seminar on 25. February was dedicated to textual analysis, especially topic modeling: “Kulturens tekster som big data. Om å analysere tekster digitalt” (Cultural textual heritage as big data. On analysing texts digitally). Starting with a presentation by Peter Leonard (Yale University Library & Digital Humanities Lab) titled “Topic Modeling & the Canon. Using curated collections to understand the ‘Great Unread'” that served as a thorough introduction to topic modeling and showed some great case studies in the end (e.g. Robots Reading Vogue). After lunch, Jon Arild Oslen from the Norwegian National Library presented their long-term digitisation project that started in 2006 wherein their complete holdings will be digitised (image & text recognition & text encoding) and made available to the public. This will include ca. 375.000 books (from as early as 1790), 3.2 mio newspapers (i.e. single issues), 42.000 periodicals (summing up to 2 mio single volumes). The project will be finished in 2018. Arne Martinus Lindstad (Norwegian National Library) talked about the library’s n-gram project while Lars Johnsen presented topic modeling with the National Library’s text corpus.

After a lively discussion with the audience, this time’s DH Forum host Anne Birgitte Rønning and I proposed a hands-on workshop for topic modeling to be held at the University of Oslo in the near future, and the current vice dean for research, Ellen Rees, announced the re-animation of the interdisciplinary research group “tekstutgivelse” (text editing & publishing) that will serve as a link between the National Library’s digital corpus and the Department for Linguistic and Scandinavian Languages’ corpus-based research and teaching and hopes to stimulate digital textual analysis endeavours.

I also did some live-tweeting during the seminars: #DHOslo

Related Posts

Automated Text Recognition with ChatGPT 4

Research Time!

DHNB Chair and DHNB2023 Co-Chair