To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao; Zhen Tan; Dawei Li; Zhiyuan Yu; Huan Liu

arXiv:2605.14291·cs.CR·May 15, 2026

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu, Huan Liu

PDF

TL;DR

This paper introduces MMGuard, a proactive method that generates imperceptible perturbations to protect multimodal data from unauthorized fine-tuning of large vision-language models, effectively degrading model performance.

Contribution

The work presents a novel defense mechanism that creates unlearnable examples with theoretical guarantees, enhancing data protection against various threat models in LVLM fine-tuning.

Findings

01

MMGuard effectively degrades downstream performance of LVLMs.

02

The method is robust across multiple models and datasets.

03

It provides theoretical guarantees for protection effectiveness.

Abstract

The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing countermeasures, such as machine unlearning and watermarks, are inherent post-hoc approaches that act only after intellectual property infringement has already occurred. In this work, we propose MMGuard to empower data owners to proactively protect their multimodal data against unauthorized LVLM fine-tuning. MMGuard generates unlearnable examples by injecting human-imperceptible perturbations that actively exploit the learning dynamics of LVLMs. By minimizing the training loss, the perturbation creates an optimization shortcut, causing the model to overfit to the noise and thereby degrading downstream performance when the perturbation is absent during inference. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.