Shifted Diffusion for Text-to-image Generation

Yufan Zhou; Bingchen Liu; Yizhe Zhu; Xiao Yang; Changyou Chen; Jinhui; Xu

arXiv:2211.15388·cs.CV·March 28, 2023·1 cites

Shifted Diffusion for Text-to-image Generation

Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui, Xu

PDF

Open Access 1 Repo

TL;DR

Corgi introduces a shifted diffusion model that improves text-to-image generation by better integrating CLIP knowledge, enabling efficient semi-supervised training and outperforming existing models like DALL-E 2 and Lafite.

Contribution

The paper proposes a novel shifted diffusion approach that incorporates CLIP knowledge into diffusion models, enhancing text-to-image generation and enabling language-free training.

Findings

01

Outperforms DALL-E 2 in efficiency and effectiveness.

02

Achieves state-of-the-art results in language-free generation.

03

Enables semi-supervised training with minimal captioned data.

Abstract

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drboog/Shifted_Diffusion
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization

MethodsNone · Contrastive Language-Image Pre-training · Diffusion