meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang

TL;DR
This paper introduces meProp, a sparsified backpropagation technique that updates only a small subset of weights, significantly reducing computation and overfitting while improving model accuracy.
Contribution
The paper presents a novel sparsification method for backpropagation that updates only top-k gradient elements, leading to faster training and better generalization.
Findings
Updating 1-4% of weights suffices for effective training.
Sparsification reduces computational cost linearly.
Model accuracy is improved with sparsified updates.
Abstract
We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top- elements (in terms of magnitude) are kept. As a result, only rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ( divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
