Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

Songze Li; Mingyu Gao; Tonghua Su; Xu-Yao Zhang; Zhongjie Wang

arXiv:2511.15164·cs.CV·March 23, 2026

Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

Songze Li, Mingyu Gao, Tonghua Su, Xu-Yao Zhang, Zhongjie Wang

PDF

Open Access

TL;DR

This paper presents a novel method for multimodal continual instruction tuning that effectively mitigates catastrophic forgetting by approximating missing gradients using geometric properties, leading to improved performance without expanding model size.

Contribution

The paper introduces a gradient guidance approach based on parameter space geometry to address catastrophic forgetting in multimodal continual learning, without increasing model complexity.

Findings

01

Achieves state-of-the-art results on multimodal continual instruction datasets.

02

Effectively mitigates catastrophic forgetting while maintaining model compactness.

03

Balances stability and plasticity dynamically with a Bernoulli sampling strategy.

Abstract

Multimodal continual instruction tuning enables multimodal large language models to sequentially adapt to new tasks while building upon previously acquired knowledge. However, this continual learning paradigm faces the significant challenge of catastrophic forgetting, where learning new tasks leads to performance degradation on previous ones. In this paper, we introduce a novel insight into catastrophic forgetting by conceptualizing it as a problem of missing gradients from old tasks during new task learning. Our approach approximates these missing gradients by leveraging the geometric properties of the parameter space, specifically using the directional vector between current parameters and previously optimal parameters as gradient guidance. This approximated gradient can be further integrated with real gradients from a limited replay buffer and regulated by a Bernoulli sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications