Dynamic Sparse Training via Balancing the Exploration-Exploitation   Trade-off

Shaoyi Huang; Bowen Lei; Dongkuan Xu; Hongwu Peng; Yue Sun; Mimi Xie,; Caiwen Ding

arXiv:2211.16667·cs.LG·April 25, 2023·1 cites

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Shaoyi Huang, Bowen Lei, Dongkuan Xu, Hongwu Peng, Yue Sun, Mimi Xie,, Caiwen Ding

PDF

Open Access

TL;DR

This paper introduces a dynamic sparse training method that balances exploration and exploitation, leading to more accurate and efficient sparse neural networks that outperform existing methods across various tasks.

Contribution

It proposes a novel acquisition function for dynamic sparse training, with theoretical guarantees and improved accuracy over state-of-the-art methods.

Findings

01

Achieves up to 98% sparsity with better accuracy than dense models.

02

Outperforms SOTA sparse training methods on multiple datasets.

03

Provides theoretical convergence guarantees for the proposed method.

Abstract

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Machine Learning and Data Classification

MethodsVisual Geometry Group 19 Layer CNN