A Little Annotation does a Lot of Good: A Study in Bootstrapping   Low-resource Named Entity Recognizers

Aditi Chaudhary; Jiateng Xie; Zaid Sheikh; Graham Neubig; Jaime G.; Carbonell

arXiv:1908.08983·cs.CL·August 27, 2019

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Aditi Chaudhary, Jiateng Xie, Zaid Sheikh, Graham Neubig, Jaime G., Carbonell

PDF

1 Repo

TL;DR

This study explores combining cross-lingual transfer and targeted annotation to efficiently develop high-quality named entity recognizers in low-resource languages, reducing annotation effort while maintaining accuracy.

Contribution

It demonstrates that a dual-strategy approach using transfer learning followed by targeted annotation outperforms traditional methods in low-resource NER tasks.

Findings

01

Cross-lingual transfer is highly effective with minimal annotated data.

02

Targeted annotation of uncertain spans achieves competitive accuracy quickly.

03

Combining transfer and targeted annotation reduces annotation effort by tenfold.

Abstract

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lower-resourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dual-strategy approach best, starting with a cross-lingual transferred model, then performing targeted annotation of only uncertain entity spans in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Aditi138/EntityTargetedActiveLearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.