Backdrop: Stochastic Backpropagation

Siavash Golkar; Kyle Cranmer

arXiv:1806.01337·stat.ML·June 6, 2018·1 cites

Backdrop: Stochastic Backpropagation

Siavash Golkar, Kyle Cranmer

PDF

Open Access 1 Repo

TL;DR

Backdrop is a novel method that introduces stochastic masking during backpropagation, improving generalization especially in multi-scale and hierarchical data structures by selectively perturbing gradients.

Contribution

It presents a simple, flexible masking technique applied during backpropagation that enhances model generalization in complex hierarchical data scenarios.

Findings

01

Backdrop improves model generalization significantly.

02

It is effective with multi-scale, hierarchical data.

03

Applicable to non-decomposable loss functions.

Abstract

We introduce backdrop, a flexible and simple-to-implement method, intuitively described as dropout acting only along the backpropagation pipeline. Backdrop is implemented via one or more masking layers which are inserted at specific points along the network. Each backdrop masking layer acts as the identity in the forward pass, but randomly masks parts of the backward gradient propagation. Intuitively, inserting a backdrop layer after any convolutional layer leads to stochastic gradients corresponding to features of that scale. Therefore, backdrop is well suited for problems in which the data have a multi-scale, hierarchical structure. Backdrop can also be applied to problems with non-decomposable loss functions where standard SGD methods are not well suited. We perform a number of experiments and demonstrate that backdrop leads to significant improvements in generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dexgen/backdrop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning

MethodsDropout · Stochastic Gradient Descent