Number Theory Meets Linguistics: Modelling Noun Pluralisation Across 1497 Languages Using 2-adic Metrics
Gregory Baker, Diego Molla-Aliod

TL;DR
This paper introduces a novel approach to modeling noun pluralization across 1497 languages using p-adic metrics, demonstrating significant improvements over traditional Euclidean methods in certain language families.
Contribution
It presents a simple linear regression model leveraging p-adic metrics for pluralization, outperforming Euclidean regressors on diverse language families.
Findings
P-adic regression outperforms Euclidean methods in several language families.
Limited evidence supports modeling noun declensions as p-adic neighborhoods.
The approach bridges number theory and linguistics for morphological modeling.
Abstract
A simple machine learning model of pluralisation as a linear regression problem minimising a p-adic metric substantially outperforms even the most robust of Euclidean-space regressors on languages in the Indo-European, Austronesian, Trans New-Guinea, Sino-Tibetan, Nilo-Saharan, Oto-Meanguean and Atlantic-Congo language families. There is insufficient evidence to support modelling distinct noun declensions as a p-adic neighbourhood even in Indo-European languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsadvanced mathematical theories · Authorship Attribution and Profiling · Topological and Geometric Data Analysis
MethodsLinear Regression
