Local Additivity Based Data Augmentation for Semi-supervised NER

Jiaao Chen; Zhenghui Wang; Ran Tian; Zichao Yang; Diyi Yang

arXiv:2010.01677·cs.CL·October 6, 2020

Local Additivity Based Data Augmentation for Semi-supervised NER

Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LADA, a novel data augmentation technique for semi-supervised NER that interpolates between samples to generate virtual data, improving entity recognition with minimal labeled data.

Contribution

The paper proposes a new Local Additivity based Data Augmentation (LADA) method with intra- and inter-sentence interpolation for semi-supervised NER, extending it with a consistency loss for unlabeled data.

Findings

01

LADA outperforms several strong baselines on NER benchmarks.

02

Interpolation-based augmentation enhances entity and context learning.

03

The method effectively reduces reliance on labeled data.

Abstract

Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by interpolating sequences close to each other. Our approach has two variations: Intra-LADA and Inter-LADA, where Intra-LADA performs interpolations among tokens within one sentence, and Inter-LADA samples different sentences to interpolate. Through linear additions between sampled training data, LADA creates an infinite amount of labeled data and improves both entity and context learning. We further extend LADA to the semi-supervised setting by designing a novel consistency loss for unlabeled data. Experiments conducted on two NER benchmarks demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GT-SALT/LADA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification