Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection
Jian Ni, Georgiana Dinu, Radu Florian

TL;DR
This paper introduces two weakly supervised methods for cross-lingual NER that do not require human annotation in the target language, leveraging annotation projection and representation projection techniques.
Contribution
It proposes novel approaches combining annotation and representation projection for zero-annotation cross-lingual NER, improving performance over existing weakly supervised methods.
Findings
Combined system outperforms other weakly supervised methods on CoNLL data.
Effective heuristic for selecting high-quality projection data.
Representation projection enables cross-lingual NER without re-training.
Abstract
The state-of-the-art named entity recognition (NER) systems are supervised machine learning models that require large amounts of manually annotated data to achieve high accuracy. However, annotating NER data by human is expensive and time-consuming, and can be quite difficult for a new language. In this paper, we present two weakly supervised approaches for cross-lingual NER with no human annotation in a target language. The first approach is to create automatically labeled NER data for a target language via annotation projection on comparable corpora, where we develop a heuristic scheme that effectively selects good-quality projection-labeled data from noisy data. The second approach is to project distributed representations of words (word embeddings) from a target language to a source language, so that the source-language NER system can be applied to the target language without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
