Improving Multilingual Named Entity Recognition with Wikipedia Entity   Type Mapping

Jian Ni; Radu Florian

arXiv:1707.02459·cs.CL·November 4, 2019

Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

Jian Ni, Radu Florian

PDF

TL;DR

This paper leverages Wikipedia to create multilingual entity type mappings, enhancing NER systems' accuracy, especially for unseen entities and low-resource languages, with up to 18.3 F1 score improvements.

Contribution

It introduces a novel method to construct high-coverage multilingual Wikipedia entity type mappings from weakly annotated data, improving NER performance without additional human annotation.

Findings

01

Improved NER accuracy on 6 languages.

02

Up to 18.3 F1 score increase for unseen entities.

03

Effective in low-resource and new domain scenarios.

Abstract

The state-of-the-art named entity recognition (NER) systems are statistical machine learning models that have strong generalization capability (i.e., can recognize unseen entities that do not appear in training data) based on lexical and contextual information. However, such a model could still make mistakes if its features favor a wrong entity type. In this paper, we utilize Wikipedia as an open knowledge base to improve multilingual NER systems. Central to our approach is the construction of high-accuracy, high-coverage multilingual Wikipedia entity type mappings. These mappings are built from weakly annotated data and can be extended to new languages with no human annotation or language-dependent knowledge involved. Based on these mappings, we develop several approaches to improve an NER system. We evaluate the performance of the approaches via experiments on NER systems trained for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.