SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models

Renyang Liu; Kangjie Chen; Han Qiu; Jie Zhang; Kwok-Yan Lam; Tianwei Zhang; See-Kiong Ng

arXiv:2601.08623·cs.CV·May 7, 2026

SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models

Renyang Liu, Kangjie Chen, Han Qiu, Jie Zhang, Kwok-Yan Lam, Tianwei Zhang, See-Kiong Ng

PDF

1 Repo

TL;DR

SafeRedir is a novel inference-time framework that redirects unsafe prompts to safe regions in image generation models, enabling robust unlearning without retraining or modifying the models.

Contribution

It introduces a prompt embedding redirection method with a safety classifier and token-level interventions, improving safety and robustness in image generation models.

Findings

01

Effective unlearning of unsafe concepts demonstrated across multiple tasks.

02

High preservation of benign content and image quality.

03

Enhanced resistance to adversarial prompt attacks.

Abstract

Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from their training data, leading to the reproduction of unsafe content such as NSFW imagery and copyrighted artistic styles. Such behaviors pose persistent safety and compliance risks in real-world deployments and cannot be reliably mitigated by post-hoc filtering, owing to the limited robustness of such mechanisms and a lack of fine-grained semantic control. Recent unlearning methods seek to erase harmful concepts at the model level, which exhibit the limitations of requiring costly retraining, degrading the quality of benign generations, or failing to withstand prompt paraphrasing and adversarial attacks. To address these challenges, we introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryliu68/SafeRedir
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.