TL;DR
This study automates the matching and linking of entries in historical Swedish encyclopedias using semantic embeddings and classifiers, revealing geographic shifts influenced by historical events between editions.
Contribution
It introduces a method for aligning and analyzing entries across encyclopedia editions using NLP techniques, enabling insights into historical geographic focus changes.
Findings
Identified a geographic shift from Europe to other continents between editions.
Demonstrated the effectiveness of semantic embeddings for entry matching.
Linked geographic entries to Wikidata for further analysis.
Abstract
The \textit{Nordisk familjebok} is a Swedish encyclopedia from the 19th and 20th centuries. It was written by a team of experts and aimed to be an intellectual reference, stressing precision and accuracy. This encyclopedia had four main editions remarkable by their size, ranging from 20 to 38 volumes. As a consequence, the \textit{Nordisk familjebok} had a considerable influence in universities, schools, the media, and society overall. As new editions were released, the selection of entries and their content evolved, reflecting intellectual changes in Sweden. In this paper, we used digitized versions from \textit{Project Runeberg}. We first resegmented the raw text into entries and matched pairs of entries between the first and second editions using semantic sentence embeddings. We then extracted the geographical entries from both editions using a transformer-based classifier and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
