Optimal Rates in Continual Linear Regression via Increasing Regularization

Ran Levinstein; Amit Attia; Matan Schliserman; Uri Sherman; Tomer Koren; Daniel Soudry; Itay Evron

arXiv:2506.06501·cs.LG·October 28, 2025

Optimal Rates in Continual Linear Regression via Increasing Regularization

Ran Levinstein, Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Itay Evron

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that increasing regularization in continual linear regression can achieve near-optimal convergence rates, closing the gap between known lower bounds and previous upper bounds, and provides practical strategies for optimal regularization schedules.

Contribution

The paper proves that increasing regularization schemes can attain optimal convergence rates in continual linear regression, improving upon prior bounds and guiding practical regularization schedules.

Findings

01

Regularization reduces the convergence rate gap in continual linear regression.

02

A fixed regularization strength achieves near-optimal $O(rac{ ext{log} k}{k})$ rate.

03

An increasing regularization schedule can attain the optimal $O(1/k)$ rate.

Abstract

We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worst-case expected loss after $k$ learning iterations admits a lower bound of $Ω (1/ k)$ . However, prior work using an unregularized scheme has only established an upper bound of $O (1/ k^{1/4})$ , leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic $ℓ_{2}$ regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of $O (lo g k / k)$ . Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimal Rates in Continual Linear Regression via Increasing Regularization· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques