Neural Unsupervised Reconstruction of Protolanguage Word Forms

Andre He; Nicholas Tomlin; Dan Klein

arXiv:2211.08684·cs.CL·November 17, 2022

Neural Unsupervised Reconstruction of Protolanguage Word Forms

Andre He, Nicholas Tomlin, Dan Klein

PDF

Open Access

TL;DR

This paper introduces a neural method for unsupervised reconstruction of ancient word forms, improving accuracy over previous approaches by capturing complex phonological and morphological changes.

Contribution

It extends classical expectation-maximization methods with neural models that incorporate monotonic alignment constraints and controlled underfitting.

Findings

01

Reduced edit distance in Latin reconstruction

02

Effective modeling of complex phonological changes

03

Improved accuracy over prior methods

Abstract

We present a state-of-the-art neural approach to the unsupervised reconstruction of ancient word forms. Previous work in this domain used expectation-maximization to predict simple phonological changes between ancient word forms and their cognates in modern languages. We extend this work with neural models that can capture more complicated phonological and morphological changes. At the same time, we preserve the inductive biases from classical methods by building monotonic alignment constraints into the model and deliberately underfitting during the maximization step. We evaluate our performance on the task of reconstructing Latin from a dataset of cognates across five Romance languages, achieving a notable reduction in edit distance from the target word forms compared to previous methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Mathematics, Computing, and Information Processing