Layered Diffusion Model for One-Shot High Resolution Text-to-Image   Synthesis

Emaad Khwaja; Abdullah Rashwan; Ting Chen; Oliver Wang; Suraj; Kothawade; Yeqing Li

arXiv:2407.06079·cs.CV·July 9, 2024

Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis

Emaad Khwaja, Abdullah Rashwan, Ting Chen, Oliver Wang, Suraj, Kothawade, Yeqing Li

PDF

Open Access

TL;DR

This paper introduces a layered diffusion model that synthesizes high-resolution images from text descriptions in a single step, using a multi-scale U-Net architecture to improve quality and efficiency.

Contribution

The novel layered U-Net architecture enables high-resolution text-to-image synthesis in one shot, outperforming baseline methods and reducing computational costs.

Findings

01

Outperforms baseline single-resolution models

02

Reduces computational cost per step

03

Achieves higher resolution synthesis without extra models

Abstract

We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architecture that simultaneously synthesizes images at multiple resolution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution, while reducing the computational cost per step. We demonstrate that higher resolution synthesis can be achieved by layering convolutions at additional resolution scales, in contrast to other methods which require additional models for super-resolution synthesis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net · Diffusion