From Isolates to Families: Using Neural Networks for Automated Language Affiliation
Frederic Blum, Steffen Herbold, Johann-Mattis List

TL;DR
This paper introduces neural network models that classify languages into families using lexical and grammatical data, improving automation in historical linguistics and revealing deep language relationships.
Contribution
It presents the first neural network approach combining lexical and grammatical data for automated language family classification, enhancing traditional methods.
Findings
Models trained on lexical data outperform grammatical-only models.
Combining lexical and grammatical data yields better classification accuracy.
Models can identify relationships among language subgroups and suggest affiliations for isolated languages.
Abstract
In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow that relies on manually comparing individual languages. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation workflows. Here, we present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,000 languages with known affiliations to classify individual languages into families. In line with the traditional assumption of most linguists, our results show that models trained on lexical data alone outperform models solely based on grammatical data, whereas combining both types of data yields even better performance. In additional experiments, we show how our models can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
