Part-of-Speech Tagging of Odia Language Using statistical and Deep Learning-Based Approaches
Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj K Sa

TL;DR
This paper develops and compares statistical and deep learning-based POS tagging models for the Odia language, achieving state-of-the-art results despite resource limitations.
Contribution
It introduces CRF, CNN, and Bi-LSTM models for Odia POS tagging, utilizing a new dataset mapped to the UD tagset and exploring various neural feature combinations.
Findings
Bi-LSTM with character features and pre-trained vectors achieved top performance.
Constructed a mapping from BIS to UD tagset for consistency.
Deep learning models outperform traditional approaches in Odia POS tagging.
Abstract
Automatic Part-of-speech (POS) tagging is a preprocessing step of many natural language processing (NLP) tasks such as name entity recognition (NER), speech processing, information extraction, word sense disambiguation, and machine translation. It has already gained a promising result in English and European languages, but in Indian languages, particularly in Odia language, it is not yet well explored because of the lack of supporting tools, resources, and morphological richness of language. Unfortunately, we were unable to locate an open source POS tagger for Odia, and only a handful of attempts have been made to develop POS taggers for Odia language. The main contribution of this research work is to present a conditional random field (CRF) and deep learning-based approaches (CNN and Bidirectional Long Short-Term Memory) to develop Odia part-of-speech tagger. We used a publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
MethodsConditional Random Field
