Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion

Sathwik Karnik; Juyeop Kim; Sanmi Koyejo; Jong-Seok Lee; Somil Bansal

arXiv:2603.00140·cs.CV·March 3, 2026

Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion

Sathwik Karnik, Juyeop Kim, Sanmi Koyejo, Jong-Seok Lee, Somil Bansal

PDF

Open Access

TL;DR

This paper introduces RADS, a novel inference-time framework that uses reachability analysis and constrained reinforcement learning to prevent memorization in text-to-image diffusion models while maintaining high image quality and prompt alignment.

Contribution

RADS is a new approach that models diffusion as a dynamical system and applies reachability analysis with constrained RL to mitigate memorization without altering the diffusion process.

Findings

01

RADS outperforms state-of-the-art methods on diversity, quality, and alignment metrics.

02

It provides robust memorization mitigation without modifying the diffusion backbone.

03

RADS is a plug-and-play solution for safer text-to-image generation.

Abstract

Text-to-image diffusion models often memorize training data, revealing a fundamental failure to generalize beyond the training set. Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization. To address this, we propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity. RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"--the set of intermediate states that inevitably evolve into memorized samples. We then formulate mitigation as a constrained reinforcement learning (RL) problem, where a policy learns to steer the trajectory away from memorization via minimal perturbations in the caption embedding space. Empirical evaluations show that RADS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques