Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Hillary Mutisya; John Mugane

arXiv:2604.22730·cs.LG·April 27, 2026

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Hillary Mutisya, John Mugane

PDF

1 Models

TL;DR

This study demonstrates that neural models trained on modern Bantu language data can effectively recover historical lexical structures, aligning with established reconstructions and linguistic classifications.

Contribution

It introduces a transformer-based approach that identifies cognates and reconstructs proto-forms across multiple Bantu languages using only modern data.

Findings

01

90.9% of top noun candidates match known Proto-Bantu forms

02

Models recover cognate clusters consistent with Guthrie classifications

03

Cross-lingual noun class similarities are statistically significant

Abstract

We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddings for their noun and verb lemmas, and identify 728 noun and 1,525 verb cognate candidates shared across 5+ languages. Evaluating these candidates against established historical resources-the Bantu Lexical Reconstructions database (BLR3; 4,786 reconstructed Proto-Bantu forms) and the ASJP basic vocabulary-we confirm 10 of the top 11 noun candidates (90.9%) align with previously reconstructed Proto-Bantu forms, including *-ntU 'person' (8 languages), *gombe 'cow' (9 languages), and *mUn (9 languages). Extending to verbs, 12 verb cognates align with reconstructed Proto-Bantu…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
thiomi/bantumorph-v7
model· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.