Training-free Diffusion Model Adaptation for Variable-Sized   Text-to-Image Synthesis

Zhiyu Jin; Xuli Shen; Bin Li; Xiangyang Xue

arXiv:2306.08645·cs.CV·October 27, 2023·6 cites

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Zhiyu Jin, Xuli Shen, Bin Li, Xiangyang Xue

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a training-free method to adapt diffusion models for variable-sized text-to-image synthesis, improving image quality and text alignment across different resolutions without additional training.

Contribution

It proposes a novel scaling factor that adjusts attention entropy to handle various image sizes, eliminating the need for retraining or fine-tuning.

Findings

01

Enhanced image quality across different resolutions

02

Improved text-image alignment without extra training

03

Validated effectiveness through extensive experiments

Abstract

Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitively disordered presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leonhlj/fouriscale
pytorch

Videos

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion