Optimal Rates in Continual Linear Regression via Increasing Regularization
Ran Levinstein, Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Itay Evron

TL;DR
This paper demonstrates that increasing regularization in continual linear regression can achieve near-optimal convergence rates, closing the gap between known lower bounds and previous upper bounds, and provides practical strategies for optimal regularization schedules.
Contribution
The paper proves that increasing regularization schemes can attain optimal convergence rates in continual linear regression, improving upon prior bounds and guiding practical regularization schedules.
Findings
Regularization reduces the convergence rate gap in continual linear regression.
A fixed regularization strength achieves near-optimal $O(rac{ ext{log} k}{k})$ rate.
An increasing regularization schedule can attain the optimal $O(1/k)$ rate.
Abstract
We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worst-case expected loss after learning iterations admits a lower bound of . However, prior work using an unregularized scheme has only established an upper bound of , leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of . Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
