Universal Dependency Treebank for Odia Language
Shantipriya Parida, Kalyanamalini Sahoo, Atul Kr. Ojha, Saraswati, Sahoo, Satya Ranjan Dash, Bijayalaxmi Dash

TL;DR
This paper introduces the first Odia language treebank annotated with Universal Dependencies, along with a preliminary parser, to support linguistic research and language technology development for Odia.
Contribution
It provides the first publicly available Odia treebank with UD annotations and a machine learning-based parser, advancing resources for Odia NLP research.
Findings
Odia treebank contains 100 sentences with manual annotations.
Parser achieved 86.6% tokenization accuracy and 64.1% UPOS accuracy.
The resource supports cross-lingual learning and typological studies.
Abstract
This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language. The treebank contains approx. 1082 tokens (100 sentences) in Odia selected from "Samantar", the largest available parallel corpora collection for Indic languages. All the selected sentences are manually annotated following the ``Universal Dependency (UD)" guidelines. The morphological analysis of the Odia treebank was performed using machine learning techniques. The Odia annotated treebank will enrich the Odia language resource and will help in building language technology tools for cross-lingual learning and typological research. We also build a preliminary Odia parser using a machine learning approach. The accuracy of the parser is 86.6% Tokenization, 64.1% UPOS, 63.78% XPOS, 42.04% UAS and 21.34% LAS. Finally, the paper briefly discusses the linguistic analysis of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
