Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization
Aditya Biswas

TL;DR
This paper introduces PSiLON Net, an MLP architecture utilizing $L_1$ weight normalization and 1-path-norm regularization, simplifying analysis and promoting efficient, near-sparse learning, with extensions to residual networks and pruning methods.
Contribution
It presents a novel neural network architecture with simplified 1-path-norm regularization, a pruning method for sparsity, and a residual block design that bounds Lipschitz constants efficiently.
Findings
Effective regularization with 1-path-norm improves generalization.
Pruning achieves exact sparsity in trained models.
Strong performance in small data regimes with overparameterized networks.
Abstract
We present PSiLON Net, an MLP architecture that uses weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms
MethodsWeight Normalization · Pruning
