Fraternal Dropout
Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

TL;DR
Fraternal Dropout introduces a regularization technique for RNNs that improves robustness and performance by training two shared-parameter copies with different dropout masks and minimizing their prediction differences.
Contribution
The paper proposes fraternal dropout, a novel regularization method for RNNs that enhances invariance to dropout masks and achieves state-of-the-art results on multiple benchmarks.
Findings
Achieved state-of-the-art results on Penn Treebank and Wikitext-2.
Improved performance in image captioning on Microsoft COCO.
Enhanced semi-supervised learning on CIFAR-10.
Abstract
Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsFraternal Dropout · Dropout
