Test-time Conditional Text-to-Image Synthesis Using Diffusion Models

Tripti Shukla; Srikrishna Karanam; Balaji Vasan Srinivasan

arXiv:2411.10800·cs.CV·November 19, 2024

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models

Tripti Shukla, Srikrishna Karanam, Balaji Vasan Srinivasan

PDF

Open Access

TL;DR

This paper introduces TINTIN, a training-free, test-time method for conditional text-to-image synthesis with diffusion models, enabling flexible control over outputs using various conditioning factors like color and edges.

Contribution

The paper presents TINTIN, a novel test-time algorithm that manipulates diffusion model outputs without retraining, allowing control with multiple conditioning inputs such as color palettes and edge maps.

Findings

01

Significant qualitative improvements over state-of-the-art methods.

02

Effective control of generated images using color palettes and edge maps.

03

Demonstrated flexibility and extensibility of the approach.

Abstract

We consider the problem of conditional text-to-image synthesis with diffusion models. Most recent works need to either finetune specific parts of the base diffusion model or introduce new trainable parameters, leading to deployment inflexibility due to the need for training. To address this gap in the current literature, we propose our method called TINTIN: Test-time Conditional Text-to-Image Synthesis using Diffusion Models which is a new training-free test-time only algorithm to condition text-to-image diffusion model outputs on conditioning factors such as color palettes and edge maps. In particular, we propose to interpret noise predictions during denoising as gradients of an energy-based model, leading to a flexible approach to manipulate the noise by matching predictions inferred from them to the ground truth conditioning input. This results in, to the best of our knowledge, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques

MethodsBalanced Selection · Diffusion