Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
Jivnesh Sandhan

TL;DR
This paper develops linguistically-informed neural models for Sanskrit NLP tasks, addressing challenges like morphology and low-resource constraints, and introduces a toolkit to enhance accessibility of Sanskrit manuscripts.
Contribution
It proposes novel neural architectures tailored for Sanskrit, demonstrating state-of-the-art results and providing a web-based toolkit for real-time linguistic analysis.
Findings
Achieved state-of-the-art performance in Sanskrit NLP tasks
Developed a web-based toolkit for real-time Sanskrit analysis
Enhanced accessibility of Sanskrit manuscripts through NLP technologies
Abstract
The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus
