Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion

Jiwon Kim; Pureum Kim; SeonHwa Kim; Soobin Park; Eunju Cha; Kyong Hwan Jin

arXiv:2508.09575·cs.CV·August 14, 2025

Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion

Jiwon Kim, Pureum Kim, SeonHwa Kim, Soobin Park, Eunju Cha, Kyong Hwan Jin

PDF

TL;DR

This paper introduces a training-free dual recursive feedback system for controllable text-to-image diffusion models, enhancing the preservation of spatial structures and fine-grained appearance details in generated images.

Contribution

It proposes a novel dual recursive feedback mechanism that refines latents to better reflect control conditions without additional training.

Findings

01

Improves spatial and appearance control in T2I models

02

Enables fine-grained generation between class-invariant structures

03

Produces semantically coherent and structurally consistent images

Abstract

Recent advancements in controllable text-to-image (T2I) diffusion models, such as Ctrl-X and FreeControl, have demonstrated robust spatial and appearance control without requiring auxiliary module training. However, these models often struggle to accurately preserve spatial structures and fail to capture fine-grained conditions related to object poses and scene layouts. To address these challenges, we propose a training-free Dual Recursive Feedback (DRF) system that properly reflects control conditions in controllable T2I models. The proposed DRF consists of appearance feedback and generation feedback that recursively refines the intermediate latents to better reflect the given appearance information and the user's intent. This dual-update mechanism guides latent representations toward reliable manifolds, effectively integrating structural and appearance attributes. Our approach enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.