Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
Shaoyi Huang, Bowen Lei, Dongkuan Xu, Hongwu Peng, Yue Sun, Mimi Xie,, Caiwen Ding

TL;DR
This paper introduces a dynamic sparse training method that balances exploration and exploitation, leading to more accurate and efficient sparse neural networks that outperform existing methods across various tasks.
Contribution
It proposes a novel acquisition function for dynamic sparse training, with theoretical guarantees and improved accuracy over state-of-the-art methods.
Findings
Achieves up to 98% sparsity with better accuracy than dense models.
Outperforms SOTA sparse training methods on multiple datasets.
Provides theoretical convergence guarantees for the proposed method.
Abstract
Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Machine Learning and Data Classification
MethodsVisual Geometry Group 19 Layer CNN
