Gradient-based Weight Density Balancing for Robust Dynamic Sparse   Training

Mathias Parger; Alexander Ertl; Paul Eibensteiner; Joerg H. Mueller,; Martin Winter; Markus Steinberger

arXiv:2210.14012·cs.LG·November 4, 2022

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Mathias Parger, Alexander Ertl, Paul Eibensteiner, Joerg H. Mueller,, Martin Winter, Markus Steinberger

PDF

Open Access

TL;DR

This paper introduces a gradient-based method for dynamically balancing weight density across neural network layers during sparse training, leading to improved performance at high sparsity levels.

Contribution

It proposes Global Gradient-based Redistribution, a novel technique that adaptively allocates weights across layers based on their needs, enhancing sparse network training.

Findings

01

Less prone to unbalanced weight distribution at initialization

02

Achieves better performance at very high sparsity levels

03

Effective in redistributing weights during training

Abstract

Training a sparse neural network from scratch requires optimizing connections at the same time as the weights themselves. Typically, the weights are redistributed after a predefined number of weight updates, removing a fraction of the parameters of each layer and inserting them at different locations in the same layers. The density of each layer is determined using heuristics, often purely based on the size of the parameter tensor. While the connections per layer are optimized multiple times during training, the density of each layer remains constant. This leaves great unrealized potential, especially in scenarios with a high sparsity of 90% and more. We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most. Our evaluation shows that our approach is less prone to unbalanced weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Human Pose and Action Recognition