Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change
Ananth Hariharan, David Mortensen

TL;DR
This paper introduces a hybrid neural-symbolic approach using weak supervision and ensemble methods to analyze 2000 years of Sanskrit evolution, revealing complex morphological redistribution rather than simplification.
Contribution
It presents a novel scalable, interpretable framework combining symbolic and neural methods for diachronic linguistic analysis of low-resource languages.
Findings
Sanskrit's morphological complexity redistributes over time
Verbal features decline cyclically, replaced by compounding and new terminology
Ensemble achieves 52.4% feature detection rate with well-calibrated uncertainty
Abstract
This study demonstrates how hybrid neural-symbolic methods can yield significant new insights into the evolution of a morphologically rich, low-resource language. We challenge the naive assumption that linguistic change is simplification by quantitatively analyzing over 2,000 years of Sanskrit, demonstrating how weakly-supervised hybrid methods can yield new insights into the evolution of morphologically rich, low-resource languages. Our approach addresses data scarcity through weak supervision, using 100+ high-precision regex patterns to generate pseudo-labels for fine-tuning a multilingual BERT. We then fuse symbolic and neural outputs via a novel confidence-weighted ensemble, creating a system that is both scalable and interpretable. Applying this framework to a 1.47-million-word diachronic corpus, our ensemble achieves a 52.4% overall feature detection rate. Our findings reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Authorship Attribution and Profiling
