Unknown Words Analysis in POS tagging of Sinhala Language
A.J.P.M.P. Jayaweera, N.G.J. Dias

TL;DR
This paper investigates the use of syntactical knowledge and word class distinctions to improve POS tagging of unknown words in Sinhala, demonstrating enhanced performance without human intervention.
Contribution
It introduces a novel approach combining syntactical rules and open/closed class distinctions to better handle unknown words in Sinhala POS tagging.
Findings
Improved tagging accuracy with syntactical rules
Effective unknown word parsing without human input
Enhanced NLP system robustness for Sinhala
Abstract
Part of Speech (POS) is a very vital topic in Natural Language Processing (NLP) task in any language, which involves analysing the construction of the language, behaviours and the dynamics of the language, the knowledge that could be utilized in computational linguistics analysis and automation applications. In this context, dealing with unknown words (words do not appear in the lexicon referred as unknown words) is also an important task, since growing NLP systems are used in more and more new applications. One aid of predicting lexical categories of unknown words is the use of syntactical knowledge of the language. The distinction between open class words and closed class words together with syntactical features of the language used in this research to predict lexical categories of unknown words in the tagging process. An experiment is performed to investigate the ability of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
