Building Low-Resource NER Models Using Non-Speaker Annotation
Tatiana Tsygankova, Francesca Marini, Stephen Mayhew, Dan Roth

TL;DR
This paper introduces a novel approach for low-resource NER by utilizing non-speaker annotations, demonstrating that non-native annotators can produce results comparable or superior to cross-lingual methods, especially with further effort.
Contribution
It presents a new method using non-speaker annotations for low-resource NER, showing its effectiveness compared to existing cross-lingual approaches.
Findings
Non-speaker annotations match or outperform cross-lingual methods.
Participants successfully annotated Indonesian, Russian, and Hindi.
Additional effort can further improve non-speaker annotation results.
Abstract
In low-resource natural language processing (NLP), the key problems are a lack of target language training data, and a lack of native speakers to create it. Cross-lingual methods have had notable success in addressing these concerns, but in certain common circumstances, such as insufficient pre-training corpora or languages far from the source language, their performance suffers. In this work we propose a complementary approach to building low-resource Named Entity Recognition (NER) models using ``non-speaker'' (NS) annotations, provided by annotators with no prior experience in the target language. We recruit 30 participants in a carefully controlled annotation experiment with Indonesian, Russian, and Hindi. We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations, and have the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
