CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

Md Anwar Hossen; Fatema Siddika; Juan Pablo Munoz; Tanya Roosta; Ali Jannesari

arXiv:2605.05732·cs.LG·May 11, 2026

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari

PDF

TL;DR

CRAFT is a continual learning framework for large language models that uses low-rank interventions and KL divergence to prevent forgetting while adapting to new tasks.

Contribution

It introduces a novel approach that avoids weight updates by learning interventions in hidden representations, unifying routing, regularization, and merging with a KL-based objective.

Findings

01

CRAFT improves performance over LoRA-based methods across multiple benchmarks.

02

CRAFT significantly reduces catastrophic forgetting in continual learning scenarios.

03

The approach remains robust to task ordering and scales with model size.

Abstract

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence against the group's prior state, which directly controls forgetting and determines convergence; finally, it merges interventions for the updated task into the shared representation using the same KL signal. This design unifies routing, regularization, and merging through a single KL-based objective. CRAFT improves overall performance and reduces forgetting compared to strong LoRA-based approaches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.