Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu, Yuxin Xiong, Xintong Li, Yu Xia, Ruoyu Wang, Yu Wang, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Jingbo Shang, Julian McAuley

TL;DR
This paper introduces a novel method called modality-decoupled gradient descent (MDGD) that effectively preserves pre-trained visual knowledge in multimodal large language models during instruction-tuning, addressing visual forgetting.
Contribution
The paper proposes MDGD, a gradient regulation technique that maintains visual representation richness and disentangles visual understanding from task alignment, improving visual knowledge retention during fine-tuning.
Findings
MDGD significantly reduces visual forgetting in MLLMs.
The approach enables efficient, parameter-efficient fine-tuning with gradient masking.
Extensive experiments show improved visual knowledge preservation and task adaptation.
Abstract
Recent MLLMs have shown emerging visual understanding and reasoning abilities after being pre-trained on large-scale multimodal datasets. Unlike pre-training, where MLLMs receive rich visual-text alignment, instruction-tuning is often text-driven with weaker visual supervision, leading to the degradation of pre-trained visual understanding and causing visual forgetting. Existing approaches, such as direct fine-tuning and continual learning methods, fail to explicitly address this issue, often compressing visual representations and prioritizing task alignment over visual retention, which further worsens visual forgetting. To overcome this limitation, we introduce a novel perspective leveraging effective rank to quantify the degradation of visual representation richness, interpreting this degradation through the information bottleneck principle as excessive compression that leads to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment · Robotics and Automated Systems
