FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models
Yi Sun, Zhiqi Zhang, Xinhao Zhong, Yimin Zhou, Shuoyang Sun, Bin Chen, Shu-Tao Xia, Ke Xu

TL;DR
FlowErase-RL introduces a reward-based framework for concept erasure in flow matching models, enhancing safety and control in text-to-image generation without requiring supervised data.
Contribution
It reformulates concept erasure as a reward optimization problem using a dual-path mechanism, enabling scalable and effective multi-concept erasure without explicit supervision.
Findings
Achieves state-of-the-art erasure performance on various concepts.
Maintains high image quality and semantic alignment after erasure.
Demonstrates robustness against adversarial attacks and scalability to multiple concepts.
Abstract
Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
