From Isolates to Families: Using Neural Networks for Automated Language Affiliation

Frederic Blum; Steffen Herbold; Johann-Mattis List

arXiv:2502.11688·cs.CL·December 9, 2025·2 cites

From Isolates to Families: Using Neural Networks for Automated Language Affiliation

Frederic Blum, Steffen Herbold, Johann-Mattis List

PDF

Open Access 1 Video

TL;DR

This paper introduces neural network models that classify languages into families using lexical and grammatical data, improving automation in historical linguistics and revealing deep language relationships.

Contribution

It presents the first neural network approach combining lexical and grammatical data for automated language family classification, enhancing traditional methods.

Findings

01

Models trained on lexical data outperform grammatical-only models.

02

Combining lexical and grammatical data yields better classification accuracy.

03

Models can identify relationships among language subgroups and suggest affiliations for isolated languages.

Abstract

In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow that relies on manually comparing individual languages. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation workflows. Here, we present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,000 languages with known affiliations to classify individual languages into families. In line with the traditional assumption of most linguists, our results show that models trained on lexical data alone outperform models solely based on grammatical data, whereas combining both types of data yields even better performance. In additional experiments, we show how our models can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Isolates to Families: Using Neural Networks for Automated Language Affiliation· underline

Taxonomy

TopicsNatural Language Processing Techniques