CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for   Natural Language Understanding

Yanru Qu; Dinghan Shen; Yelong Shen; Sandra Sajeev; Jiawei Han; Weizhu; Chen

arXiv:2010.08670·cs.CL·October 20, 2020·28 cites

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu, Chen

PDF

Open Access 1 Video

TL;DR

CoDA is a novel data augmentation framework for NLP that combines multiple transformations with contrastive regularization, improving model performance on various natural language understanding tasks, especially in low-resource settings.

Contribution

The paper introduces CoDA, a new data augmentation method integrating diverse transformations and contrastive learning to enhance NLP model generalization.

Findings

01

Achieves 2.2% improvement on GLUE with RoBERTa-large.

02

Outperforms several baselines including adversarial training.

03

Contrastive objective enhances various augmentation methods.

Abstract

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis