A Simple but Tough-to-Beat Data Augmentation Approach for Natural   Language Understanding and Generation

Dinghan Shen; Mingzhi Zheng; Yelong Shen; Yanru Qu; Weizhu Chen

arXiv:2009.13818·cs.CL·October 26, 2020·94 cites

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation

Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen

PDF

Open Access 2 Repos

TL;DR

This paper introduces a simple data augmentation method called cutoff, which erases parts of input sentences to improve natural language understanding and generation, achieving competitive or superior results to more complex adversarial approaches.

Contribution

The paper proposes a stochastic data augmentation technique called cutoff that enhances NLP models' performance with minimal computational overhead.

Findings

01

Cutoff performs on par or better than adversarial methods on GLUE.

02

Significant BLEU score improvements in machine translation.

03

State-of-the-art results on IWSLT2014 German-English dataset.

Abstract

Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erased to yield its restricted views (during the fine-tuning stage). Notably, this process relies merely on stochastic sampling and thus adds little computational overhead. A Jensen-Shannon Divergence consistency loss is further utilized to incorporate these augmented samples into the training objective in a principled manner. To verify the effectiveness of the proposed strategies, we apply cutoff to both natural language understanding and generation problems. On the GLUE benchmark, it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Attention Is All You Need