Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields
Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril, Ivanov Simov

TL;DR
This paper introduces a feature-rich Conditional Random Fields approach for Bulgarian Named Entity Recognition, combining language-specific features and resources to achieve high accuracy comparable to English benchmarks.
Contribution
It develops a novel Bulgarian NER system utilizing extensive lexical, syntactic, and morphological features, including rich tagsets and gazetteers, tailored for Bulgarian language processing.
Findings
Achieved F1 score of 89.4% on Bulgarian NER
Effectively integrated language-specific features and unlabeled data
Comparable performance to English NER systems
Abstract
The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
