Feature-Rich Part-of-speech Tagging for Morphologically Complex   Languages: Application to Bulgarian

Georgi Georgiev; Valentin Zhikov; Petya Osenova; Kiril Simov; Preslav; Nakov

arXiv:1911.11503·cs.CL·November 27, 2019·24 cites

Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Georgi Georgiev, Valentin Zhikov, Petya Osenova, Kiril Simov, Preslav, Nakov

PDF

Open Access

TL;DR

This paper introduces a feature-rich POS tagging approach for Bulgarian, leveraging extensive morphological information and a large tag set, achieving high accuracy and advancing the state-of-the-art in morphologically complex language processing.

Contribution

It presents a novel method combining a large morphological lexicon with guided learning for POS tagging in Bulgarian, handling 680 tags for improved accuracy.

Findings

01

Achieved 97.98% tagging accuracy

02

Significant improvement over previous Bulgarian POS taggers

03

Demonstrated effectiveness of large morphological lexicons

Abstract

We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification