Optimal L2 Regularization in High-dimensional Continual Linear Regression

Gilad Karpel; Edward Moroshko; Ran Levinstein; Ron Meir; Daniel Soudry; Itay Evron

arXiv:2601.13844·cs.LG·April 14, 2026

Optimal L2 Regularization in High-dimensional Continual Linear Regression

Gilad Karpel, Edward Moroshko, Ran Levinstein, Ron Meir, Daniel Soudry, Itay Evron

PDF

TL;DR

This paper derives a closed-form expression for generalization loss in high-dimensional continual linear regression, revealing how optimal L2 regularization scales with tasks and improves noise mitigation.

Contribution

It provides the first theoretical result showing optimal regularization strength scales as T/ln T in continual learning, validated by experiments.

Findings

01

Optimal regularization mitigates label noise in continual learning.

02

The regularization strength scales nearly linearly with the number of tasks.

03

Experiments confirm the theoretical scaling law's impact on generalization.

Abstract

We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$ , specifically as $T / ln T$ . To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.