TL;DR
This paper introduces a regularization method based on group Lasso to optimize neural network weights, neuron counts, and feature selection simultaneously, resulting in more compact and efficient models.
Contribution
It extends group Lasso to neural networks, enabling joint optimization of weights, neurons, and features in a unified framework using standard optimization routines.
Findings
Achieves competitive performance with significantly smaller networks.
Effectively reduces input features while maintaining accuracy.
Outperforms classical weight decay and Lasso penalties in experiments.
Abstract
In this paper, we consider the joint task of simultaneously optimizing (i) the weights of a deep neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are generally dealt with separately, we present a simple regularized formulation allowing to solve all three of them in parallel, using standard optimization routines. Specifically, we extend the group Lasso penalty (originated in the linear regression literature) in order to impose group-level sparsity on the network's connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We perform an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsWeight Decay · Linear Regression
