Feature-Rich Named Entity Recognition for Bulgarian Using Conditional   Random Fields

Georgi Georgiev; Preslav Nakov; Kuzman Ganchev; Petya Osenova; Kiril; Ivanov Simov

arXiv:2109.15121·cs.CL·October 1, 2021·22 cites

Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril, Ivanov Simov

PDF

Open Access

TL;DR

This paper introduces a feature-rich Conditional Random Fields approach for Bulgarian Named Entity Recognition, combining language-specific features and resources to achieve high accuracy comparable to English benchmarks.

Contribution

It develops a novel Bulgarian NER system utilizing extensive lexical, syntactic, and morphological features, including rich tagsets and gazetteers, tailored for Bulgarian language processing.

Findings

01

Achieved F1 score of 89.4% on Bulgarian NER

02

Effectively integrated language-specific features and unlabeled data

03

Comparable performance to English NER systems

Abstract

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies