Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
Jiwon Kim, Pureum Kim, SeonHwa Kim, Soobin Park, Eunju Cha, Kyong Hwan Jin

TL;DR
This paper introduces a training-free dual recursive feedback system for controllable text-to-image diffusion models, enhancing the preservation of spatial structures and fine-grained appearance details in generated images.
Contribution
It proposes a novel dual recursive feedback mechanism that refines latents to better reflect control conditions without additional training.
Findings
Improves spatial and appearance control in T2I models
Enables fine-grained generation between class-invariant structures
Produces semantically coherent and structurally consistent images
Abstract
Recent advancements in controllable text-to-image (T2I) diffusion models, such as Ctrl-X and FreeControl, have demonstrated robust spatial and appearance control without requiring auxiliary module training. However, these models often struggle to accurately preserve spatial structures and fail to capture fine-grained conditions related to object poses and scene layouts. To address these challenges, we propose a training-free Dual Recursive Feedback (DRF) system that properly reflects control conditions in controllable T2I models. The proposed DRF consists of appearance feedback and generation feedback that recursively refines the intermediate latents to better reflect the given appearance information and the user's intent. This dual-update mechanism guides latent representations toward reliable manifolds, effectively integrating structural and appearance attributes. Our approach enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
