Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency Modeling
Yang Li, Shihao Ji

TL;DR
This paper introduces Dep-$L_0$, a dependency modeling approach for $L_0$ regularization in neural network pruning, significantly improving sparsification performance on large-scale datasets like ImageNet.
Contribution
It proposes a dependency modeling of binary gates using an MLP to enhance $L_0$ regularization for network sparsification, addressing limitations of mean-field approximation.
Findings
Dep-$L_0$ outperforms previous $L_0$ methods on ImageNet.
Dependency modeling improves sparsification on large datasets.
Achieves state-of-the-art results compared to other sparsification algorithms.
Abstract
Training deep neural networks with an regularization is one of the prominent approaches for network pruning or sparsification. The method prunes the network during training by encouraging weights to become exactly zero. However, recent work of Gale et al. reveals that although this method yields high compression rates on smaller datasets, it performs inconsistently on large-scale learning tasks, such as ResNet50 on ImageNet. We analyze this phenomenon through the lens of variational inference and find that it is likely due to the independent modeling of binary gates, the mean-field approximation, which is known in Bayesian statistics for its poor performance due to the crude approximation. To mitigate this deficiency, we propose a dependency modeling of binary gates, which can be modeled effectively as a multi-layer perceptron (MLP). We term our algorithm Dep- as it prunes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning · Variational Inference
