ELLA: Efficient Lifelong Learning for Adapters in Large Language Models
Shristi Das Biswas, Yue Zhang, Anwesan Pal, Radhika Bhargava, Kaushik Roy

TL;DR
ELLA is a scalable lifelong learning framework for large language models that mitigates catastrophic forgetting without replay or expansion, using a novel subspace de-correlation regularizer to enable transfer and improve performance.
Contribution
ELLA introduces a new regularization-based method for lifelong learning in LLMs that selectively de-correlates task-specific updates, outperforming existing methods in efficiency and accuracy.
Findings
Achieves state-of-the-art continual learning performance on benchmarks.
Reduces memory footprint by up to 35 times compared to previous methods.
Enhances zero-shot generalization to unseen tasks.
Abstract
Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and privacy-violating, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively reducing the residual degrees of freedom and eliminating forward transfer by forbidding overlap in shared representations. In this work, we introduce ELLA, a training framework built on the principle of selective subspace de-correlation. Rather than forbidding all overlap, ELLA explicitly characterizes the structure of past updates and penalizes alignments along their high-energy, task-specific directions, while preserving freedom in the low-energy residual subspaces to enable transfer. Formally, this is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education · Face recognition and analysis
