Local Additivity Based Data Augmentation for Semi-supervised NER
Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang

TL;DR
This paper introduces LADA, a novel data augmentation technique for semi-supervised NER that interpolates between samples to generate virtual data, improving entity recognition with minimal labeled data.
Contribution
The paper proposes a new Local Additivity based Data Augmentation (LADA) method with intra- and inter-sentence interpolation for semi-supervised NER, extending it with a consistency loss for unlabeled data.
Findings
LADA outperforms several strong baselines on NER benchmarks.
Interpolation-based augmentation enhances entity and context learning.
The method effectively reduces reliance on labeled data.
Abstract
Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by interpolating sequences close to each other. Our approach has two variations: Intra-LADA and Inter-LADA, where Intra-LADA performs interpolations among tokens within one sentence, and Inter-LADA samples different sentences to interpolate. Through linear additions between sampled training data, LADA creates an infinite amount of labeled data and improves both entity and context learning. We further extend LADA to the semi-supervised setting by designing a novel consistency loss for unlabeled data. Experiments conducted on two NER benchmarks demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
