Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

Ziqi Wang; Chang Che; Qi Wang; Hui Ma; Zenglin Shi; Cees G. M. Snoek; Meng Wang

arXiv:2511.20158·cs.CV·November 26, 2025

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

Ziqi Wang, Chang Che, Qi Wang, Hui Ma, Zenglin Shi, Cees G. M. Snoek, Meng Wang

PDF

Open Access

TL;DR

This paper introduces Harmonious Parameter Adaptation (HPA), a novel framework for continual visual instruction tuning of safety-aligned multimodal large language models, effectively balancing safety and task performance while reducing forgetting.

Contribution

HPA is a new post-training method that partitions, selects, and orthogonally adjusts parameters to maintain safety and task accuracy during continual learning.

Findings

01

HPA outperforms existing methods in safety preservation.

02

HPA significantly reduces catastrophic forgetting.

03

HPA achieves balanced safety and task performance.

Abstract

While continual visual instruction tuning (CVIT) has shown promise in adapting multimodal large language models (MLLMs), existing studies predominantly focus on models without safety alignment. This critical oversight ignores the fact that real-world MLLMs inherently require such mechanisms to mitigate potential risks. In this work, we shift our focus to CVIT for safety-aligned MLLMs and observe that during continual adaptation, the model not only suffers from task forgetting but also exhibits degradation in its safety. Achieving a harmonious balance between safety and task performance remains a crucial challenge. To address this, we propose Harmonious Parameter Adaptation (HPA), a post-training framework composed of focusing-based parameter partition, harmoniously balanced parameter selection, and orthogonal parameter adjustment. Specifically, HPA partitions parameters into two types…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Subtitles and Audiovisual Media