TL;DR
This paper introduces a minimal effort back propagation method that sparsifies gradients and simplifies neural network models, significantly reducing computational costs while maintaining or improving accuracy.
Contribution
The authors propose a novel gradient sparsification technique and model simplification approach that accelerates training and decoding without sacrificing accuracy.
Findings
Most weights updated are fewer than 5% per backpropagation.
Model simplification achieves around 9x reduction without accuracy loss.
Accuracy can be improved through the proposed simplification.
Abstract
We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
