meProp: Sparsified Back Propagation for Accelerated Deep Learning with   Reduced Overfitting

Xu Sun; Xuancheng Ren; Shuming Ma; Houfeng Wang

arXiv:1706.06197·cs.LG·March 12, 2019·84 cites

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces meProp, a sparsified backpropagation technique that updates only a small subset of weights, significantly reducing computation and overfitting while improving model accuracy.

Contribution

The paper presents a novel sparsification method for backpropagation that updates only top-k gradient elements, leading to faster training and better generalization.

Findings

01

Updating 1-4% of weights suffices for effective training.

02

Sparsification reduces computational cost linearly.

03

Model accuracy is improved with sparsified updates.

Abstract

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top- $k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ( $k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning