POLYGLOT-NER: Massive Multilingual Named Entity Recognition
Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

TL;DR
This paper presents a scalable, minimal-resource approach to building multilingual Named Entity Recognition systems for 40 languages using Wikipedia and Freebase, without relying on language-specific annotated data.
Contribution
The paper introduces a novel language-agnostic method for constructing NER annotators for numerous languages without requiring traditional annotated datasets or linguistic resources.
Findings
Achieved competitive NER performance across 40 languages.
Developed a distant evaluation method for low-resource languages.
Demonstrated minimal human intervention in system construction.
Abstract
The increasing diversity of languages used on the web introduces a new level of complexity to Information Retrieval (IR) systems. We can no longer assume that textual content is written in one language or even the same language family. In this paper, we demonstrate how to build massive multilingual annotators with minimal human expertise and intervention. We describe a system that builds Named Entity Recognition (NER) annotators for 40 major languages using Wikipedia and Freebase. Our approach does not require NER human annotated datasets or language specific resources like treebanks, parallel corpora, and orthographic rules. The novelty of approach lies therein - using only language agnostic techniques, while achieving competitive performance. Our method learns distributed word representations (word embeddings) which encode semantic and syntactic features of words in each language.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
