Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

Renyang Liu; Guanlin Li; Tianwei Zhang; See-Kiong Ng

arXiv:2507.07139·cs.CV·February 17, 2026

Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

Renyang Liu, Guanlin Li, Tianwei Zhang, See-Kiong Ng

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Recall, an adversarial framework that exploits multi-modal inputs to challenge the robustness of image generation models' unlearning, revealing vulnerabilities in current methods.

Contribution

Recall is a novel multi-modal adversarial approach that efficiently compromises unlearning in diffusion-based image models using a single reference image.

Findings

01

Recall outperforms existing baselines in effectiveness and efficiency

02

It exposes vulnerabilities in current unlearning techniques

03

The study highlights the need for more robust unlearning solutions

Abstract

Recent advances in image generation models (IGMs), particularly diffusion-based architectures such as Stable Diffusion (SD), have markedly enhanced the quality and diversity of AI-generated visual content. However, their generative capability has also raised significant ethical, legal, and societal concerns, including the potential to produce harmful, misleading, or copyright-infringing content. To mitigate these concerns, machine unlearning (MU) emerges as a promising solution by selectively removing undesirable concepts from pretrained models. Nevertheless, the robustness and effectiveness of existing unlearning techniques remain largely unexplored, particularly in the presence of multi-modal adversarial inputs. To bridge this gap, we propose Recall, a novel adversarial framework explicitly designed to compromise the robustness of unlearned IGMs. Unlike existing approaches that…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. novel multi-modal attack pipeline: latent encoding with reference image blending, iterative latent optimization and the final multi-modal attack using optimized adversarial image with the original text prompt. 2. Strong empirical validation across diverse settings. The evaluation experiments are impressively comprehensive (10 unlearning methods and 3 attack baselines.) The proposed method, RECALL, consistently achieve the best attack performance and also superior semantic alignment.

Weaknesses

1. Authors have overclaimed the independency of their proposed attack method. During attack process, only a single reference image is needed, however, the reference images are still generated by original diffusion models. So, there is an assumption that the original diffusion models are accessible, which cannot be achieved in some cases. 2. Although Appx. F claims the robustness across references, the main text underplays the sensitivity of results to poorly aligned or compositionally distinct

Reviewer 02Rating 6Confidence 3

Strengths

1. RECALL successfully identifies adversarial examples in the latent image space, providing compelling evidence that existing unlearning methods (e.g., fine-tuning, knowledge distillation) fail to fully eradicate sensitive or proprietary concepts. 2. The paper includes a comprehensive experimental evaluation and extensive ablation studies that thoroughly assess the potential of image-level attacks on unlearning methods across various metrics and unlearning targets. 3. Efficiency and Practicality

Weaknesses

1. Despite the claim of "reference independence" in Section 5.5, the method fundamentally relies on a reference image during the adversarial optimization in Stages I and II. The authors must clarify the specific requirements for this reference image. For instance, what characteristics might a reference image possess that could cause the attack to fail or significantly degrade its performance? Furthermore, given the results in Table 4, which suggest that a simple Image-Only attack can already res

Reviewer 03Rating 8Confidence 5

Strengths

1. Recall introduced multi-modal (image+text) attack with the text prompt unmodified, which generates the unlearned image while still keeping semantic fidelity to the original unmodified prompt. The experiment results show SOTA accuracy. 2. Recall is computationally and practically efficient. It doesn't require external models or classifiers. Performing the adversarial optimization directly in the model's latent space is computationally more efficient, which is supported by experiment results. 3

Weaknesses

1. The paper is more on empirical side. While the results are good, it lacks a theoretical analysis explaining why the multi-modal pathway is so vulnerable or providing formal guarantees about the attack's convergence. 2. The adv_img even though is effective, it will be easily rejected by real image gen system by simple safe guarding before it reaches to the model. 3. adversarial prompt attack was proven to be a good method. what about adversarial prompt + adversarial image, will it get higher A

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Ethics and Social Impacts of AI