Adversarial Word Dilution as Text Data Augmentation in Low-Resource   Regime

Junfan Chen; Richong Zhang; Zheyan Luo; Chunming Hu; Yongyi Mao

arXiv:2305.09287·cs.CL·August 10, 2023·2 cites

Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime

Junfan Chen, Richong Zhang, Zheyan Luo, Chunming Hu, Yongyi Mao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Adversarial Word Dilution (AWD), a novel data augmentation technique for low-resource text classification that creates hard positive examples by adversarially diluting strong positive words, leading to improved model performance.

Contribution

The paper proposes AWD, a new adversarial augmentation method that generates interpretable hard positive examples by diluting key words, outperforming existing augmentation techniques in low-resource settings.

Findings

01

AWD outperforms state-of-the-art augmentation methods on benchmark datasets.

02

Generated augmentations are interpretable and adaptable to new data.

03

AWD enhances low-resource text classification accuracy.

Abstract

Data augmentation is widely used in text classification, especially in the low-resource regime where a few examples for each class are available during training. Despite the success, generating data augmentations as hard positive examples that may increase their effectiveness is under-explored. This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations to train the low-resource text classification model efficiently. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding, making the augmented inputs hard to be recognized as positive by the classification model. We adversarially learn the dilution weights through a constrained min-max optimization process with the guidance of the labels. Empirical studies on three benchmark datasets show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BDBC-KG-NLP/AAAI2023_AWD
pytorchOfficial

Videos

Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime· underline

Taxonomy

TopicsTopic Modeling