Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele, Cheng, Yan Li, Tingting Gao, Di Zhang

TL;DR
This paper introduces ParaDiffusion, a diffusion model that leverages large language models and a new dataset to improve paragraph-to-image generation, achieving better semantic alignment and visual quality for complex scenes.
Contribution
It presents a novel information-enriched diffusion model and a high-quality dataset for long-text image generation, enhancing semantic alignment and image fidelity.
Findings
Outperforms state-of-the-art models on key benchmarks
Achieves up to 15% improvement in visual appeal
Achieves up to 45% improvement in text faithfulness
Abstract
Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods
MethodsDiffusion
