LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image   Generation

Mushui Liu; Yuhang Ma; Yang Zhen; Jun Dan; Yunlong Yu; Zeng Zhao,; Zhipeng Hu; Bai Liu; Changjie Fan

arXiv:2407.00737·cs.CV·August 28, 2024·1 cites

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Yang Zhen, Jun Dan, Yunlong Yu, Zeng Zhao,, Zhipeng Hu, Bai Liu, Changjie Fan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LLM4GEN, a framework that enhances text-to-image diffusion models by integrating large language model representations, improving semantic understanding, especially for complex prompts, and demonstrating significant performance gains.

Contribution

The paper presents a novel plug-and-play module, CAM, and an entity-guided regularization loss to improve semantic alignment and image quality in text-to-image generation models.

Findings

01

9.69% improvement in color accuracy for SD1.5

02

12.90% improvement in color accuracy for SDXL

03

Outperforms existing models in image-text alignment and human evaluations

Abstract

Diffusion models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts involving multiple objects, attribute binding, and long descriptions. In this paper, we propose a novel framework called \textbf{LLM4GEN}, which enhances the semantic understanding of text-to-image diffusion models by leveraging the representation of Large Language Models (LLMs). It can be seamlessly incorporated into various diffusion models as a plug-and-play component. A specially designed Cross-Adapter Module (CAM) integrates the original text features of text-to-image models with LLM features, thereby enhancing text-to-image generation. Additionally, to facilitate and correct entity-attribute relationships in text prompts, we develop an entity-guided regularization loss to further improve generation performance. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhang-ma/llm4gen
pytorch

Videos

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation· underline

Taxonomy

TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques

MethodsDiffusion