RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named   Entity Recognition

Sihan Song; Furao Shen; Jian Zhao

arXiv:2307.07417·cs.CL·July 18, 2023

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition

Sihan Song, Furao Shen, Jian Zhao

PDF

Open Access

TL;DR

RoPDA introduces a robust prompt-based data augmentation method for low-resource NER that improves performance by generating high-quality augmented data and effectively utilizing unlabeled data.

Contribution

It proposes a novel prompt-based augmentation framework with self-filtering and mixup techniques to enhance low-resource NER performance.

Findings

01

Significant performance improvements over strong baselines.

02

Outperforms state-of-the-art semi-supervised methods with unlabeled data.

03

Effective augmentation operations that preserve label integrity.

Abstract

Data augmentation has been widely used in low-resource NER tasks to tackle the problem of data sparsity. However, previous data augmentation methods have the disadvantages of disrupted syntactic structures, token-label mismatch, and requirement for external knowledge or manual effort. To address these issues, we propose Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER. Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation through five fundamental augmentation operations to generate label-flipping and label-preserving examples. To optimize the utilization of the augmented samples, we present two techniques: Self-Consistency Filtering and mixup. The former effectively eliminates low-quality samples, while the latter prevents performance degradation arising from the direct utilization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies