Training Simplification and Model Simplification for Deep Learning: A   Minimal Effort Back Propagation Method

Xu Sun; Xuancheng Ren; Shuming Ma; Bingzhen Wei; Wei Li; Jingjing Xu,; Houfeng Wang; Yi Zhang

arXiv:1711.06528·cs.LG·March 13, 2019

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

Xu Sun, Xuancheng Ren, Shuming Ma, Bingzhen Wei, Wei Li, Jingjing Xu,, Houfeng Wang, Yi Zhang

PDF

3 Repos

TL;DR

This paper introduces a minimal effort back propagation method that sparsifies gradients and simplifies neural network models, significantly reducing computational costs while maintaining or improving accuracy.

Contribution

The authors propose a novel gradient sparsification technique and model simplification approach that accelerates training and decoding without sacrificing accuracy.

Findings

01

Most weights updated are fewer than 5% per backpropagation.

02

Model simplification achieves around 9x reduction without accuracy loss.

03

Accuracy can be improved through the proposed simplification.

Abstract

We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.