PIXART-{\delta}: Fast and Controllable Image Generation with Latent   Consistency Models

Junsong Chen; Yue Wu; Simian Luo; Enze Xie; Sayak Paul; Ping Luo; Hang; Zhao; Zhenguo Li

arXiv:2401.05252·cs.CV·January 11, 2024·1 cites

PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models

Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang, Zhao, Zhenguo Li

PDF

Open Access 1 Repo 1 Models

TL;DR

PIXART-{ extdelta} is a rapid, controllable, high-resolution text-to-image synthesis framework that leverages Latent Consistency Models and ControlNet, achieving unprecedented inference speed and efficiency.

Contribution

It introduces PIXART-{ extdelta}, combining LCM and ControlNet for fast, controllable, high-quality image generation at 1024px resolution with efficient training and inference.

Findings

01

Achieves 0.5s inference time for 1024x1024 images

02

Enables training on 32GB V100 GPUs within a day

03

Supports 8-bit inference on 8GB GPUs

Abstract

This technical report introduces PIXART-{\delta}, a text-to-image synthesis framework that integrates the Latent Consistency Model (LCM) and ControlNet into the advanced PIXART-{\alpha} model. PIXART-{\alpha} is recognized for its ability to generate high-quality images of 1024px resolution through a remarkably efficient training process. The integration of LCM in PIXART-{\delta} significantly accelerates the inference speed, enabling the production of high-quality images in just 2-4 steps. Notably, PIXART-{\delta} achieves a breakthrough 0.5 seconds for generating 1024x1024 pixel images, marking a 7x improvement over the PIXART-{\alpha}. Additionally, PIXART-{\delta} is designed to be efficiently trainable on 32GB V100 GPUs within a single day. With its 8-bit inference capability (von Platen et al., 2023), PIXART-{\delta} can synthesize 1024px images within 8GB GPU memory constraints,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PixArt-alpha/PixArt-alpha
pytorchOfficial

Models

🤗
aipicasso/commonart-beta
model· 50 dl· ♡ 15
50 dl♡ 15

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsDiffusion