CRF-based Named Entity Recognition @ICON 2013

Arjun Das; Utpal Garain

arXiv:1409.8008·cs.CL·September 30, 2014·5 cites

CRF-based Named Entity Recognition @ICON 2013

Arjun Das, Utpal Garain

PDF

Open Access

TL;DR

This paper evaluates CRF-based systems for Named Entity Recognition across multiple Indian languages, demonstrating high accuracy for English and promising results for Bengali and Hindi using language-independent features and gazetteers.

Contribution

It introduces a language-independent feature set for CRF-based NER and explores the use of gazetteers built from Wikipedia for multiple Indian languages.

Findings

01

Highest F measure of 88% for English

02

F measure of 87% for Bengali

03

F measure of 79% for Hindi

Abstract

This paper describes performance of CRF based systems for Named Entity Recognition (NER) in Indian language as a part of ICON 2013 shared task. In this task we have considered a set of language independent features for all the languages. Only for English a language specific feature, i.e. capitalization, has been added. Next the use of gazetteer is explored for Bengali, Hindi and English. The gazetteers are built from Wikipedia and other sources. Test results show that the system achieves the highest F measure of 88% for English and the lowest F measure of 69% for both Tamil and Telugu. Note that for the least performing two languages no gazetteer was used. NER in Bengali and Hindi finds accuracy (F measure) of 87% and 79%, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsConditional Random Field