ELLA: Efficient Lifelong Learning for Adapters in Large Language Models

Shristi Das Biswas; Yue Zhang; Anwesan Pal; Radhika Bhargava; Kaushik Roy

arXiv:2601.02232·cs.LG·January 8, 2026

ELLA: Efficient Lifelong Learning for Adapters in Large Language Models

Shristi Das Biswas, Yue Zhang, Anwesan Pal, Radhika Bhargava, Kaushik Roy

PDF

Open Access 1 Video

TL;DR

ELLA is a scalable lifelong learning framework for large language models that mitigates catastrophic forgetting without replay or expansion, using a novel subspace de-correlation regularizer to enable transfer and improve performance.

Contribution

ELLA introduces a new regularization-based method for lifelong learning in LLMs that selectively de-correlates task-specific updates, outperforming existing methods in efficiency and accuracy.

Findings

01

Achieves state-of-the-art continual learning performance on benchmarks.

02

Reduces memory footprint by up to 35 times compared to previous methods.

03

Enhances zero-shot generalization to unseen tasks.

Abstract

Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and privacy-violating, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively reducing the residual degrees of freedom and eliminating forward transfer by forbidding overlap in shared representations. In this work, we introduce ELLA, a training framework built on the principle of selective subspace de-correlation. Rather than forbidding all overlap, ELLA explicitly characterizes the structure of past updates and penalizes alignments along their high-energy, task-specific directions, while preserving freedom in the low-energy residual subspaces to enable transfer. Formally, this is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ELLA: Efficient Lifelong Learning for Adapters in Large Language Models· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education · Face recognition and analysis