Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers

Ahmad Aloradi; Tim Roith; Emanu\"el A. P. Habets; Daniel Tenbrinck

arXiv:2605.07892·cs.LG·May 21, 2026

Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers

Ahmad Aloradi, Tim Roith, Emanu\"el A. P. Habets, Daniel Tenbrinck

PDF

TL;DR

This paper introduces an adaptive regularization method for Bregman-based optimizers that effectively controls sparsity levels in neural network training, reducing the need for trial-and-error parameter tuning.

Contribution

The authors propose an adaptive regularization scheme that dynamically adjusts the regularization parameter to reliably achieve target sparsity levels in deep neural networks.

Findings

01

The adaptive method reliably achieves sparsity targets between 75% and 99%.

02

It converges faster than non-adaptive baselines during early training.

03

The scheme improves out-of-distribution robustness over dense baselines.

Abstract

Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $ℓ_{1}$ penalty, often control sparsity only indirectly through a regularization parameter $λ$ , whose mapping to the final sparsity rate is non-trivial. In our experiments, we found this parameter sensitivity to be particularly pronounced for Bregman-based optimizers. Specifically, the two variants LinBreg and AdaBreg reach the same sparsity at $λ$ values that differ by up to two orders of magnitude, requiring expensive trial-and-error sweeps to achieve a user-specified sparsity. To address this, we propose an adaptive regularization scheme that updates $λ$ based on the difference between the model's current sparsity and the target sparsity. We analyze the resulting algorithm and evaluate it on automatic speaker verification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.