Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency   Modeling

Yang Li; Shihao Ji

arXiv:2107.00070·cs.LG·July 2, 2021

Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency Modeling

Yang Li, Shihao Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dep-$L_0$, a dependency modeling approach for $L_0$ regularization in neural network pruning, significantly improving sparsification performance on large-scale datasets like ImageNet.

Contribution

It proposes a dependency modeling of binary gates using an MLP to enhance $L_0$ regularization for network sparsification, addressing limitations of mean-field approximation.

Findings

01

Dep-$L_0$ outperforms previous $L_0$ methods on ImageNet.

02

Dependency modeling improves sparsification on large datasets.

03

Achieves state-of-the-art results compared to other sparsification algorithms.

Abstract

Training deep neural networks with an $L_{0}$ regularization is one of the prominent approaches for network pruning or sparsification. The method prunes the network during training by encouraging weights to become exactly zero. However, recent work of Gale et al. reveals that although this method yields high compression rates on smaller datasets, it performs inconsistently on large-scale learning tasks, such as ResNet50 on ImageNet. We analyze this phenomenon through the lens of variational inference and find that it is likely due to the independent modeling of binary gates, the mean-field approximation, which is known in Bayesian statistics for its poor performance due to the crude approximation. To mitigate this deficiency, we propose a dependency modeling of binary gates, which can be modeled effectively as a multi-layer perceptron (MLP). We term our algorithm Dep- $L_{0}$ as it prunes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leo-yangli/dep-l0
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsPruning · Variational Inference