Learning with Different Amounts of Annotation: From Zero to Many Labels

Shujian Zhang; Chengyue Gong; Eunsol Choi

arXiv:2109.04408·cs.CL·September 14, 2021

Learning with Different Amounts of Annotation: From Zero to Many Labels

Shujian Zhang, Chengyue Gong, Eunsol Choi

PDF

Open Access 1 Repo

TL;DR

This paper investigates how varying the amount of annotation per training example, including multi-label annotations, can improve NLP model performance, especially under limited annotation budgets and across different domains.

Contribution

It introduces a novel learning algorithm that effectively utilizes examples with zero, single, or multiple labels, enhancing NLP training efficiency and accuracy.

Findings

01

Multi-label annotations improve NLP task performance.

02

The proposed method outperforms traditional single-label training.

03

Benefits are consistent across tasks and domain shifts.

Abstract

Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the spectrum of language interpretation. We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task, even when we simply first train with a single label data and then fine tune with multi label examples. Extending a MixUp data augmentation framework, we propose a learning algorithm that can learn from training examples with different amount of annotation (with zero, one, or multiple labels). This algorithm efficiently combines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szhang42/uneven_training_data
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsMixup