Improving Deep Learning Optimization through Constrained Parameter Regularization
J\"org K.H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter

TL;DR
This paper introduces Constrained Parameter Regularization (CPR), a novel regularization method that enforces upper bounds on parameter norms, improving deep learning training by addressing limitations of traditional weight decay.
Contribution
CPR offers a new regularization approach that constrains parameter norms individually, using an augmented Lagrangian method, with minimal runtime overhead and no extensive hyperparameter tuning.
Findings
CPR outperforms weight decay in vision and language tasks.
CPR enhances pre-training and fine-tuning performance.
CPR maintains low computational overhead.
Abstract
Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform application of a single penalty, CPR enforces an upper bound on a statistical measure, such as the L2-norm, of individual parameter matrices. Consequently, learning becomes a constraint optimization problem, which we tackle using an adaptation of the augmented Lagrangian method. CPR introduces only a minor runtime overhead and only requires setting an upper bound. We propose simple yet efficient mechanisms for initializing this bound, making CPR rely on no hyperparameter or one, akin to weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsWeight Decay
