Kaleido Diffusion: Improving Conditional Diffusion Models with   Autoregressive Latent Modeling

Jiatao Gu; Ying Shen; Shuangfei Zhai; Yizhe Zhang; Navdeep Jaitly,; Joshua M. Susskind

arXiv:2405.21048·cs.CV·June 3, 2024

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly,, Joshua M. Susskind

PDF

Open Access

TL;DR

Kaleido Diffusion introduces autoregressive latent priors to enhance diversity in conditional diffusion model outputs, effectively controlling image generation while maintaining high quality.

Contribution

The paper proposes Kaleido, a novel method integrating autoregressive latent modeling to improve diversity and control in diffusion-based image generation from text.

Findings

01

Increased diversity of generated images with high quality.

02

Effective control of image outputs guided by latent variables.

03

Compatibility with various discrete latent representations.

Abstract

Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To address this issue, we present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregressive latent priors. Kaleido integrates an autoregressive language model that encodes the original caption and generates latent variables, serving as abstract and intermediary representations for guiding and facilitating the image generation process. In this paper, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. These representations diversify and enrich the input conditions to the diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsDiffusion