Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change

Ananth Hariharan; David Mortensen

arXiv:2512.05364·cs.CL·December 8, 2025

Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change

Ananth Hariharan, David Mortensen

PDF

Open Access

TL;DR

This paper introduces a hybrid neural-symbolic approach using weak supervision and ensemble methods to analyze 2000 years of Sanskrit evolution, revealing complex morphological redistribution rather than simplification.

Contribution

It presents a novel scalable, interpretable framework combining symbolic and neural methods for diachronic linguistic analysis of low-resource languages.

Findings

01

Sanskrit's morphological complexity redistributes over time

02

Verbal features decline cyclically, replaced by compounding and new terminology

03

Ensemble achieves 52.4% feature detection rate with well-calibrated uncertainty

Abstract

This study demonstrates how hybrid neural-symbolic methods can yield significant new insights into the evolution of a morphologically rich, low-resource language. We challenge the naive assumption that linguistic change is simplification by quantitatively analyzing over 2,000 years of Sanskrit, demonstrating how weakly-supervised hybrid methods can yield new insights into the evolution of morphologically rich, low-resource languages. Our approach addresses data scarcity through weak supervision, using 100+ high-precision regex patterns to generate pseudo-labels for fine-tuning a multilingual BERT. We then fuse symbolic and neural outputs via a novel confidence-weighted ensemble, creating a system that is both scalable and interpretable. Applying this framework to a 1.47-million-word diachronic corpus, our ensemble achieves a 52.4% overall feature detection rate. Our findings reveal that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Authorship Attribution and Profiling