Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning

Chengyi Cai; Zesheng Ye; Peike Li; Bo Han; Jianzhong Qi; Feng Liu

arXiv:2601.22020·cs.LG·January 30, 2026

Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning

Chengyi Cai, Zesheng Ye, Peike Li, Bo Han, Jianzhong Qi, Feng Liu

PDF

Open Access

TL;DR

This paper introduces ViKeR, a novel method for unlearning in multimodal large language models that uses visual cues to identify and prioritize key tokens during the unlearning process, improving effectiveness and coherence.

Contribution

The paper proposes a new visual-guided regularization approach for unlearning in MLLMs, addressing the limitations of existing methods by focusing on key tokens and incorporating visual information.

Findings

01

Effective unlearning demonstrated on MLLMU and CLEAR benchmarks.

02

Mitigates forgetting while maintaining response coherence.

03

Prioritizes key tokens using visual cues and entropy-based measures.

Abstract

Unlearning in Multimodal Large Language Models (MLLMs) prevents the model from revealing private information when queried about target images. Existing MLLM unlearning methods largely adopt approaches developed for LLMs. They treat all answer tokens uniformly, disregarding their varying importance in the unlearning process. Moreover, these methods focus exclusively on the language modality, disregarding visual cues that indicate key tokens in answers. In this paper, after formulating the problem of unlearning in multimodal question answering for MLLMs, we propose Visual-Guided Key-Token Regularization (ViKeR). We leverage irrelevant visual inputs to predict ideal post-unlearning token-level distributions and use these distributions to regularize the unlearning process, thereby prioritizing key tokens. Further, we define key tokens in unlearning via information entropy and discuss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning