Differentiable Sparsification for Deep Neural Networks
Yognjin Lee

TL;DR
This paper introduces a novel fully differentiable sparsification technique for deep neural networks that enables end-to-end learning of sparse structures and weights, reducing resource consumption and simplifying model optimization.
Contribution
It presents the first fully differentiable sparsification method that can be integrated into standard training procedures for various neural network architectures.
Findings
Effectively zeroes out unimportant parameters during training.
Can be applied to different neural network architectures with minimal modifications.
Enables end-to-end learning of sparse models using stochastic gradient descent.
Abstract
Deep neural networks have significantly alleviated the burden of feature engineering, but comparable efforts are now required to determine effective architectures for these networks. Furthermore, as network sizes have become excessively large, a substantial amount of resources is invested in reducing their sizes. These challenges can be effectively addressed through the sparsification of over-complete models. In this study, we propose a fully differentiable sparsification method for deep neural networks, which can zero out unimportant parameters by directly optimizing a regularized objective function with stochastic gradient descent. Consequently, the proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. It can be directly applied to various modern deep neural networks and requires minimal modification to the training process. To the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
