A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation
Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen

TL;DR
This paper introduces a simple data augmentation method called cutoff, which erases parts of input sentences to improve natural language understanding and generation, achieving competitive or superior results to more complex adversarial approaches.
Contribution
The paper proposes a stochastic data augmentation technique called cutoff that enhances NLP models' performance with minimal computational overhead.
Findings
Cutoff performs on par or better than adversarial methods on GLUE.
Significant BLEU score improvements in machine translation.
State-of-the-art results on IWSLT2014 German-English dataset.
Abstract
Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability. However, it typically requires expensive computation to determine the direction of the injected perturbations. In this paper, we introduce a set of simple yet effective data augmentation strategies dubbed cutoff, where part of the information within an input sentence is erased to yield its restricted views (during the fine-tuning stage). Notably, this process relies merely on stochastic sampling and thus adds little computational overhead. A Jensen-Shannon Divergence consistency loss is further utilized to incorporate these augmented samples into the training objective in a principled manner. To verify the effectiveness of the proposed strategies, we apply cutoff to both natural language understanding and generation problems. On the GLUE benchmark, it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Attention Is All You Need
