Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings
Sidsel Boldsen, Patrizia Paggio

TL;DR
This paper introduces a novel NLP method using diachronic character embeddings to detect and analyze historical sound changes, demonstrated through Danish plosive lenition, offering insights into language evolution.
Contribution
It proposes a new approach for modeling sound change via PPMI character embeddings and validates it on synthetic and historical Danish data.
Findings
Successfully identified sound changes in Danish historical sources
Uncovered meaningful linguistic contexts of sound shifts
Potential to study chronology and geography of language change
Abstract
While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method's ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Language and cultural evolution
