Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
Sathwik Karnik, Juyeop Kim, Sanmi Koyejo, Jong-Seok Lee, Somil Bansal

TL;DR
This paper introduces RADS, a novel inference-time framework that uses reachability analysis and constrained reinforcement learning to prevent memorization in text-to-image diffusion models while maintaining high image quality and prompt alignment.
Contribution
RADS is a new approach that models diffusion as a dynamical system and applies reachability analysis with constrained RL to mitigate memorization without altering the diffusion process.
Findings
RADS outperforms state-of-the-art methods on diversity, quality, and alignment metrics.
It provides robust memorization mitigation without modifying the diffusion backbone.
RADS is a plug-and-play solution for safer text-to-image generation.
Abstract
Text-to-image diffusion models often memorize training data, revealing a fundamental failure to generalize beyond the training set. Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization. To address this, we propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity. RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"--the set of intermediate states that inevitably evolve into memorized samples. We then formulate mitigation as a constrained reinforcement learning (RL) problem, where a policy learns to steer the trajectory away from memorization via minimal perturbations in the caption embedding space. Empirical evaluations show that RADS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques
