When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing

Siyuan Xu; Yibing Liu; Peilin Chen; Yung-Hui Li; Shiqi Wang; Sam Kwong

arXiv:2512.07166·cs.CV·December 9, 2025

When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing

Siyuan Xu, Yibing Liu, Peilin Chen, Yung-Hui Li, Shiqi Wang, Sam Kwong

PDF

Open Access 1 Video

TL;DR

This paper addresses the challenge of restoring privacy in multimodal large language models by introducing a new dataset and a guided generation approach that balances privacy recovery with model utility.

Contribution

It introduces the SPPE dataset for evaluating privacy recovery and proposes a unified guided generation method for reconstructing private content in MLLMs.

Findings

01

Effective privacy recovery demonstrated on SPPE and InstructPix2Pix datasets.

02

The approach generalizes well across diverse visual content.

03

Achieves a balance between privacy protection and model usability.

Abstract

Privacy leakage in Multimodal Large Language Models (MLLMs) has long been an intractable problem. Existing studies, though effectively obscure private information in MLLMs, often overlook the evaluation of the authenticity and recovery quality of user privacy. To this end, this work uniquely focuses on the critical challenge of how to restore surrogate-driven protected data in diverse MLLM scenarios. We first bridge this research gap by contributing the SPPE (Surrogate Privacy Protected Editable) dataset, which includes a wide range of privacy categories and user instructions to simulate real MLLM applications. This dataset offers protected surrogates alongside their various MLLM-edited versions, thus enabling the direct assessment of privacy recovery quality. By formulating privacy recovery as a guided generation task conditioned on complementary multimodal signals, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Authorship Attribution and Profiling