Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction
Bowen Lei, Dongkuan Xu, Ruqi Zhang, Shuren He, Bani K. Mallick

TL;DR
This paper introduces an adaptive gradient correction method to accelerate and stabilize sparse neural network training, reducing training epochs and improving accuracy in resource-limited scenarios.
Contribution
We propose a novel adaptive gradient correction technique that improves convergence speed and stability of sparse training methods, applicable under standard and adversarial conditions.
Findings
Outperforms existing methods by up to 5.0% accuracy at the same epochs.
Reduces training epochs by up to 52.1% for the same accuracy.
Demonstrates effectiveness across multiple datasets, models, and sparsity levels.
Abstract
Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with the most popular sparse training pipelines under both standard and adversarial setups. Theoretically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
