R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang; Lijun Wu; Juntao Li; Yue Wang; Qi Meng; Tao Qin; Wei; Chen; Min Zhang; Tie-Yan Liu

arXiv:2106.14448·cs.LG·November 1, 2021·306 cites

R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei, Chen, Min Zhang, Tie-Yan Liu

PDF

Open Access 5 Repos 1 Video

TL;DR

R-Drop introduces a regularization method that enhances dropout by encouraging consistent output distributions from sub-models, leading to improved performance across diverse NLP and vision tasks, including state-of-the-art results in machine translation.

Contribution

It proposes R-Drop, a novel regularization strategy that enforces output consistency between dropout sub-models, improving neural network training and performance.

Findings

01

R-Drop improves results on 5 deep learning tasks across 18 datasets.

02

It achieves state-of-the-art BLEU scores on WMT14 translation tasks.

03

R-Drop enhances fine-tuning of large pre-trained models like BART and RoBERTa.

Abstract

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $5$ widely used deep learning tasks ( $18$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

R-Drop: Regularized Dropout for Neural Networks· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Label Smoothing