Sanskrit Sandhi Splitting using seq2(seq)^2

Rahul Aralikatte; Neelamadhav Gantayat; Naveen Panwar; Anush Sankaran,; Senthil Mani

arXiv:1801.00428·cs.CL·July 16, 2019

Sanskrit Sandhi Splitting using seq2(seq)^2

Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran,, Senthil Mani

PDF

TL;DR

This paper introduces a novel deep learning model, DD-RNN, for Sanskrit Sandhi splitting that accurately predicts split locations and constituent words, outperforming existing methods and demonstrating cross-lingual generalization to Chinese segmentation.

Contribution

The paper presents the DD-RNN architecture, achieving high accuracy in Sanskrit Sandhi splitting and showcasing its effectiveness in Chinese word segmentation tasks.

Findings

01

Split location prediction accuracy of 95%

02

Constituent word prediction accuracy of 79.5%

03

Outperforms state-of-the-art methods by 20%

Abstract

In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.