Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Yukun Zhao; Lingyong Yan; Zhenyang Li; Shuaiqiang Wang; Zhumin Chen; Zhaochun Ren; Dawei Yin

arXiv:2505.15467·cs.CL·April 15, 2026

Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Yukun Zhao, Lingyong Yan, Zhenyang Li, Shuaiqiang Wang, Zhumin Chen, Zhaochun Ren, Dawei Yin

PDF

TL;DR

The paper introduces Joint Flashback Adaptation, a method that uses limited prompts from old tasks to improve incremental learning in large language models, reducing forgetting and enhancing generalization without access to original data.

Contribution

It proposes a novel approach that leverages flashbacks and latent task interpolation for task-agnostic, data-efficient continual learning in large language models.

Findings

01

Outperforms existing methods in 1000+ instruction tasks.

02

Reduces catastrophic forgetting effectively.

03

Enhances generalization to new tasks.

Abstract

Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We first introduce flashbacks -- a limited number of prompts from old tasks -- when adapting to new tasks and constrain the deviations of the model outputs compared to the original one. We then interpolate latent tasks between flashbacks and new tasks to enable jointly learning relevant latent tasks, new tasks, and flashbacks, alleviating data sparsity in flashbacks and facilitating knowledge sharing for smooth adaptation. Our method requires only a limited number of flashbacks without access to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.