Deciphering Undersegmented Ancient Scripts Using Phonetic Prior
Jiaming Luo, Frederik Hartmann, Enrico Santus, Yuan Cao, Regina, Barzilay

TL;DR
This paper introduces a phonetic-aware model for deciphering ancient scripts that are not segmented into words and lack known related languages, leveraging linguistic constraints and IPA-based embeddings.
Contribution
It presents a novel generative framework that jointly models word segmentation and cognate alignment using phonological constraints and IPA-based character embeddings.
Findings
Improves decipherment accuracy on Gothic and Ugaritic scripts.
Proposes a measure for language closeness that aligns with scholarly consensus.
Demonstrates the model's effectiveness on both deciphered and undeciphered languages.
Abstract
Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Phonetics and Phonology Research · Speech Recognition and Synthesis
