Flipout: Efficient Pseudo-Independent Weight Perturbations on   Mini-Batches

Yeming Wen; Paul Vicol; Jimmy Ba; Dustin Tran; Roger Grosse

arXiv:1803.04386·cs.LG·April 3, 2018·175 cites

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

PDF

Open Access 3 Repos

TL;DR

Flipout is a novel method that decorrelates weight perturbations within mini-batches, significantly improving variance reduction, training speed, and regularization effectiveness in neural networks, especially in Bayesian and reinforcement learning contexts.

Contribution

We introduce flipout, an efficient technique for pseudo-independent weight perturbations, enabling better variance reduction and faster training across various neural network architectures.

Findings

01

Achieves ideal variance reduction in multiple network types

02

Provides significant speedups in training neural networks

03

Outperforms previous regularization methods for LSTMs

Abstract

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example. Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. We find significant speedups in training neural networks with multiplicative Gaussian perturbations. We show that flipout is effective at regularizing LSTMs, and outperforms previous methods. Flipout also enables us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques