IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages

Sumesh VP

arXiv:2604.13078·cs.CL·April 16, 2026

IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages

Sumesh VP

PDF

TL;DR

The paper presents IWLV Ramayana, a structured, multilingual, sarga-aligned parallel corpus of Valmiki's Ramayana, facilitating cross-linguistic analysis and digital humanities research.

Contribution

It introduces the first sarga-aligned multilingual Ramayana corpus with explicit provenance metadata in a machine-readable format.

Findings

01

Includes complete English and Malayalam layers.

02

Active development of Hindi, Tamil, Kannada, and Telugu layers.

03

Enables comparative literature and multilingual NLP applications.

Abstract

The Ramayana is among the most influential literary traditions of South and Southeast Asia, transmitted across numerous linguistic and cultural contexts over two millennia. Despite extensive scholarship on regional Ramayana traditions, computational resources enabling systematic cross-linguistic analysis remain limited. This paper introduces the IWLV Ramayana Corpus, a structured parallel corpus aligning Valmiki's Ramayana across multiple Indian languages at the level of the sarga (chapter). The corpus currently includes complete English and Malayalam layers, with Hindi, Tamil, Kannada, and Telugu layers in active production. The dataset is distributed in structured JSONL format with explicit provenance metadata, enabling applications in comparative literature, corpus linguistics, digital humanities, and multilingual natural language processing. To our knowledge, this is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.