Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
Tripti Shukla, Srikrishna Karanam, Balaji Vasan Srinivasan

TL;DR
This paper introduces TINTIN, a training-free, test-time method for conditional text-to-image synthesis with diffusion models, enabling flexible control over outputs using various conditioning factors like color and edges.
Contribution
The paper presents TINTIN, a novel test-time algorithm that manipulates diffusion model outputs without retraining, allowing control with multiple conditioning inputs such as color palettes and edge maps.
Findings
Significant qualitative improvements over state-of-the-art methods.
Effective control of generated images using color palettes and edge maps.
Demonstrated flexibility and extensibility of the approach.
Abstract
We consider the problem of conditional text-to-image synthesis with diffusion models. Most recent works need to either finetune specific parts of the base diffusion model or introduce new trainable parameters, leading to deployment inflexibility due to the need for training. To address this gap in the current literature, we propose our method called TINTIN: Test-time Conditional Text-to-Image Synthesis using Diffusion Models which is a new training-free test-time only algorithm to condition text-to-image diffusion model outputs on conditioning factors such as color palettes and edge maps. In particular, we propose to interpret noise predictions during denoising as gradients of an energy-based model, leading to a flexible approach to manipulate the noise by matching predictions inferred from them to the ground truth conditioning input. This results in, to the best of our knowledge, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques
MethodsBalanced Selection · Diffusion
