Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation
Binglei Li, Mengping Yang, Zhiyu Tan, Junping Zhang, Hao Li

TL;DR
Diff-Aid introduces an inference-time adaptive method for text-to-image diffusion models that enhances semantic alignment and visual quality by dynamically adjusting interactions between textual and visual features across model stages.
Contribution
It proposes a flexible, plug-and-play inference-time approach that adaptively modulates text-image interactions, improving generation quality and interpretability in diffusion models.
Findings
Consistent improvements in prompt adherence and visual quality.
Enhanced human preference scores across experiments.
Effective integration with downstream applications like style transfer and zero-shot editing.
Abstract
Recent text-to-image (T2I) diffusion models have achieved remarkable advancement, yet faithfully following complex textual descriptions remains challenging due to insufficient interactions between textual and visual features. Prior approaches enhance such interactions via architectural design or handcrafted textual condition weighting, but lack flexibility and overlook the dynamic interactions across different blocks and denoising stages. To provide a more flexible and efficient solution to this problem, we propose Diff-Aid, a lightweight inference-time method that adaptively adjusts per-token text and image interactions across transformer blocks and denoising timesteps. Beyond improving generation quality, Diff-Aid yields interpretable modulation patterns that reveal how different blocks, timesteps, and textual tokens contribute to semantic alignment during denoising. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Digital Humanities and Scholarship
