OptG: Optimizing Gradient-driven Criteria in Network Sparsity
Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Fei Chao, Rongrong Ji

TL;DR
OptG introduces a novel approach to network sparsity by integrating supermask training with gradient-driven criteria, effectively addressing the independence paradox and achieving superior performance at high sparsity levels.
Contribution
This paper proposes OptG, a new method that combines supermask training with gradient-driven sparsity to improve sparse network performance by solving the independence paradox.
Findings
OptG surpasses state-of-the-art methods at ultra-high sparsity levels.
Supermask training partly solves the independence paradox in gradient-driven sparsity.
OptG demonstrates significant performance improvements in sparse networks.
Abstract
Network sparsity receives popularity mostly due to its capability to reduce the network complexity. Extensive studies excavate gradient-driven sparsity. Typically, these methods are constructed upon premise of weight independence, which however, is contrary to the fact that weights are mutually influenced. Thus, their performance remains to be improved. In this paper, we propose to optimize gradient-driven sparsity (OptG) by solving this independence paradox. Our motive comes from the recent advances in supermask training which shows that high-performing sparse subnetworks can be located by simply updating mask values without modifying any weight. We prove that supermask training is to accumulate the criteria of gradient-driven sparsity for both removed and preserved weights, and it can partly solve the independence paradox. Consequently, OptG integrates supermask training into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Advanced Computing and Algorithms
