Accent Placement Models for Rigvedic Sanskrit Text

Akhil Rajeev P; Annarao Kulkarni

arXiv:2511.23088·cs.CL·December 1, 2025

Accent Placement Models for Rigvedic Sanskrit Text

Akhil Rajeev P, Annarao Kulkarni

PDF

Open Access

TL;DR

This paper develops and compares neural models for automatic accent placement in Rigvedic Sanskrit, establishing baselines and highlighting practical considerations for heritage language NLP applications.

Contribution

It introduces a parallel corpus and evaluates three strategies for accent placement, demonstrating the effectiveness of fine-tuning approaches and providing reproducible benchmarks.

Findings

01

Full fine-tuning of ByT5 achieves lowest error rates.

02

LoRA fine-tuning offers a good efficiency-accuracy balance.

03

Baseline BiLSTM-CRF provides transparency and simplicity.

Abstract

The Rigveda, among the oldest Indian texts in Vedic Sanskrit, employs a distinctive pitch-accent system : ud\=atta, anud\=atta, svarita whose marks encode melodic and interpretive cues but are often absent from modern e-texts. This work develops a parallel corpus of accented-unaccented \'slokas and conducts a controlled comparison of three strategies for automatic accent placement in Rigvedic verse: (i) full fine-tuning of ByT5, a byte-level Transformer that operates directly on Unicode combining marks, (ii) a from-scratch BiLSTM-CRF sequence-labeling baseline, and (iii) LoRA-based parameter-efficient fine-tuning atop ByT5. Evaluation uses Word Error Rate (WER) and Character Error Rate (CER) for orthographic fidelity, plus a task-specific Diacritic Error Rate (DER) that isolates accent edits. Full ByT5 fine-tuning attains the lowest error across all metrics; LoRA offers strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship