LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Yang Zhen, Jun Dan, Yunlong Yu, Zeng Zhao,, Zhipeng Hu, Bai Liu, Changjie Fan

TL;DR
This paper introduces LLM4GEN, a framework that enhances text-to-image diffusion models by integrating large language model representations, improving semantic understanding, especially for complex prompts, and demonstrating significant performance gains.
Contribution
The paper presents a novel plug-and-play module, CAM, and an entity-guided regularization loss to improve semantic alignment and image quality in text-to-image generation models.
Findings
9.69% improvement in color accuracy for SD1.5
12.90% improvement in color accuracy for SDXL
Outperforms existing models in image-text alignment and human evaluations
Abstract
Diffusion models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts involving multiple objects, attribute binding, and long descriptions. In this paper, we propose a novel framework called \textbf{LLM4GEN}, which enhances the semantic understanding of text-to-image diffusion models by leveraging the representation of Large Language Models (LLMs). It can be seamlessly incorporated into various diffusion models as a plug-and-play component. A specially designed Cross-Adapter Module (CAM) integrates the original text features of text-to-image models with LLM features, thereby enhancing text-to-image generation. Additionally, to facilitate and correct entity-attribute relationships in text prompts, we develop an entity-guided regularization loss to further improve generation performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques
MethodsDiffusion
