Accent Placement Models for Rigvedic Sanskrit Text
Akhil Rajeev P, Annarao Kulkarni

TL;DR
This paper develops and compares neural models for automatic accent placement in Rigvedic Sanskrit, establishing baselines and highlighting practical considerations for heritage language NLP applications.
Contribution
It introduces a parallel corpus and evaluates three strategies for accent placement, demonstrating the effectiveness of fine-tuning approaches and providing reproducible benchmarks.
Findings
Full fine-tuning of ByT5 achieves lowest error rates.
LoRA fine-tuning offers a good efficiency-accuracy balance.
Baseline BiLSTM-CRF provides transparency and simplicity.
Abstract
The Rigveda, among the oldest Indian texts in Vedic Sanskrit, employs a distinctive pitch-accent system : ud\=atta, anud\=atta, svarita whose marks encode melodic and interpretive cues but are often absent from modern e-texts. This work develops a parallel corpus of accented-unaccented \'slokas and conducts a controlled comparison of three strategies for automatic accent placement in Rigvedic verse: (i) full fine-tuning of ByT5, a byte-level Transformer that operates directly on Unicode combining marks, (ii) a from-scratch BiLSTM-CRF sequence-labeling baseline, and (iii) LoRA-based parameter-efficient fine-tuning atop ByT5. Evaluation uses Word Error Rate (WER) and Character Error Rate (CER) for orthographic fidelity, plus a task-specific Diacritic Error Rate (DER) that isolates accent edits. Full ByT5 fine-tuning attains the lowest error across all metrics; LoRA offers strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship
