Are Greedy Task Orderings Better Than Random in Continual Linear Regression?
Matan Tsipory, Ran Levinstein, Itay Evron, Mark Kong, Deanna Needell, Daniel Soudry

TL;DR
This paper investigates whether greedy task orderings outperform random orderings in continual linear regression, providing theoretical bounds and empirical evidence that greedy orderings can converge faster but may also fail catastrophically under certain conditions.
Contribution
The paper formalizes greedy task orderings using Kaczmarz methods, compares their convergence properties to random orderings, and highlights conditions where greedy methods excel or fail.
Findings
Greedy orderings converge faster than random ones in empirical experiments.
In high-rank settings, loss bounds for greedy and random orderings are similar.
Single-pass greedy orderings can fail catastrophically, while repeated ones have better convergence rates.
Abstract
We analyze task orderings in continual learning for linear regression, assuming joint realizability of training data. We focus on orderings that greedily maximize dissimilarity between consecutive tasks, a concept briefly explored in prior work but still surrounded by open questions. Using tools from the Kaczmarz method literature, we formalize such orderings and develop geometric and algebraic intuitions around them. Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
