Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian
Georgi Georgiev, Valentin Zhikov, Petya Osenova, Kiril Simov, Preslav, Nakov

TL;DR
This paper introduces a feature-rich POS tagging approach for Bulgarian, leveraging extensive morphological information and a large tag set, achieving high accuracy and advancing the state-of-the-art in morphologically complex language processing.
Contribution
It presents a novel method combining a large morphological lexicon with guided learning for POS tagging in Bulgarian, handling 680 tags for improved accuracy.
Findings
Achieved 97.98% tagging accuracy
Significant improvement over previous Bulgarian POS taggers
Demonstrated effectiveness of large morphological lexicons
Abstract
We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
