Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training
Mathias Parger, Alexander Ertl, Paul Eibensteiner, Joerg H. Mueller,, Martin Winter, Markus Steinberger

TL;DR
This paper introduces a gradient-based method for dynamically balancing weight density across neural network layers during sparse training, leading to improved performance at high sparsity levels.
Contribution
It proposes Global Gradient-based Redistribution, a novel technique that adaptively allocates weights across layers based on their needs, enhancing sparse network training.
Findings
Less prone to unbalanced weight distribution at initialization
Achieves better performance at very high sparsity levels
Effective in redistributing weights during training
Abstract
Training a sparse neural network from scratch requires optimizing connections at the same time as the weights themselves. Typically, the weights are redistributed after a predefined number of weight updates, removing a fraction of the parameters of each layer and inserting them at different locations in the same layers. The density of each layer is determined using heuristics, often purely based on the size of the parameter tensor. While the connections per layer are optimized multiple times during training, the density of each layer remains constant. This leaves great unrealized potential, especially in scenarios with a high sparsity of 90% and more. We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most. Our evaluation shows that our approach is less prone to unbalanced weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Human Pose and Action Recognition
