LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning
Liangzu Peng, Juan Elenter, Joshua Agterberg, Alejandro Ribeiro, Ren\'e Vidal

TL;DR
LoRanPAC is a stable, theoretically grounded continual learning method that leverages low-rank approximations of pre-trained features to achieve high performance across multiple tasks.
Contribution
It introduces a novel low-rank feature truncation approach with theoretical guarantees, bridging the gap between empirical performance and theoretical understanding in continual learning.
Findings
Outperforms state-of-the-art CL methods on multiple datasets
Handles hundreds of tasks with stability and high accuracy
Provides theoretical guarantees for training and generalization errors
Abstract
The goal of continual learning (CL) is to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. Conversely, principled CL approaches often fail to achieve competitive performance. In this work, we aim to bridge this gap between theory and practice by designing a simple CL method that is theoretically sound and highly performant. Specifically, we lift pre-trained features into a higher dimensional space and formulate an over-parametrized minimum-norm least-squares problem. We find that the lifted features are highly ill-conditioned, potentially leading to large training errors (numerical instability) and increased generalization errors. We address…
Peer Reviews
Decision·ICLR 2025 Poster
The authors provided comprehensive theoretical and empirical results.
A simple baseline in continual learning is experience replay. But I did not see the authors providing the results for experience-replay-based method.
The problem and setup is well motivated. The results are natural and the writing is clear. The experimental results are extensive and seem significant.
It seems to me that the main contribution of this work is an algorithm for continual principal component regression, with an application to training with random features. Specially, the properties of the features that arise from pretraining, random projections and nonlinearity don't seem to matter much to the derivations theoretically. On the other hand, if I view the work mainly as an algorithm for continual principal component regression, then the guarantees given in theorems 1 and 2 are a bla
1. The paper clearly demonstrates the contribution of the proposed approach and is easy to follow. 2. Extensive experiments across multiple datasets show that ICL-TSVD remains stable regarding hyperparameter selection (including regularization parameter $\lambda$ and truncation percentage $\zeta$) and consistently outperforms previous CL baselines, particularly RanPAC. 3. It provides theoretical guarantees for the proposed method ICL-TSVD by revealing a recurrence relation throughout the CL pr
1. The continual Implementation of $\overline{\bf{W}}_t$ involves the computation of $\widetilde{\bf{B}}_t$, which has the form of $\widetilde{\bf{B}}_t = [\widetilde{\bf{U}}\_{1:t-1}\widetilde{\bf{\Sigma}}\_{1:t-1}, \widetilde{\bf{H}}_t]$ when $t\geq 2$ according to Eq. (3). Appendix C.1 briefly mentions that it is empirically found continually updating all SVD factors $\widetilde{\bf{U}}\_{1:t}, \widetilde{\bf{\Sigma}}\_{1:t}$ and $\widetilde{\bf{V}}\_{1:t}$ lead to large test errors. However,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · COVID-19 diagnosis using AI · Machine Learning in Healthcare
