Improving Deep Learning Optimization through Constrained Parameter   Regularization

J\"org K.H. Franke; Michael Hefenbrock; Gregor Koehler; Frank Hutter

arXiv:2311.09058·cs.LG·December 10, 2024·2 cites

Improving Deep Learning Optimization through Constrained Parameter Regularization

J\"org K.H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Constrained Parameter Regularization (CPR), a novel regularization method that enforces upper bounds on parameter norms, improving deep learning training by addressing limitations of traditional weight decay.

Contribution

CPR offers a new regularization approach that constrains parameter norms individually, using an augmented Lagrangian method, with minimal runtime overhead and no extensive hyperparameter tuning.

Findings

01

CPR outperforms weight decay in vision and language tasks.

02

CPR enhances pre-training and fine-tuning performance.

03

CPR maintains low computational overhead.

Abstract

Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform application of a single penalty, CPR enforces an upper bound on a statistical measure, such as the L2-norm, of individual parameter matrices. Consequently, learning becomes a constraint optimization problem, which we tackle using an adaptation of the augmented Lagrangian method. CPR introduces only a minor runtime overhead and only requires setting an upper bound. We propose simple yet efficient mechanisms for initializing this bound, making CPR rely on no hyperparameter or one, akin to weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

automl/cpr
pytorchOfficial

Videos

Improving Deep Learning Optimization through Constrained Parameter Regularization· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsWeight Decay