Decoder-Only LLMs are Better Controllers for Diffusion Models

Ziyi Dong; Yao Xiao; Pengxu Wei; Liang Lin

arXiv:2502.04412·cs.CV·February 10, 2025

Decoder-Only LLMs are Better Controllers for Diffusion Models

Ziyi Dong, Yao Xiao, Pengxu Wei, Liang Lin

PDF

Open Access

TL;DR

This paper introduces a method to improve diffusion-based text-to-image models by integrating decoder-only large language models, resulting in better semantic understanding and higher quality image generation.

Contribution

It proposes an adapter to enable diffusion models to leverage decoder-only LLMs, enhancing their semantic understanding and generation performance.

Findings

01

Enhanced models outperform state-of-the-art in quality and reliability

02

Adapter effectively bridges diffusion models with decoder-only LLMs

03

Theoretical analysis supports architecture choices

Abstract

Groundbreaking advancements in text-to-image generation have recently been achieved with the emergence of diffusion models. These models exhibit a remarkable ability to generate highly artistic and intricately detailed images based on textual prompts. However, obtaining desired generation outcomes often necessitates repetitive trials of manipulating text prompts just like casting spells on a magic mirror, and the reason behind that is the limited capability of semantic understanding inherent in current image generation models. Specifically, existing diffusion models encode the text prompt input with a pre-trained encoder structure, which is usually trained on a limited number of image-caption pairs. The state-of-the-art large language models (LLMs) based on the decoder-only structure have shown a powerful semantic understanding capability as their architectures are more suitable for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods for differential equations

MethodsDiffusion · Adapter