Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Marioriyad, Parham Rezaei, Mahdieh Soleymani Baghshah and, Mohammad Hossein Rohban

TL;DR
This paper evaluates the compositional generation abilities of diffusion-based and autoregressive text-to-image models, revealing that diffusion models like FLUX outperform autoregressive models like LlamaGen in compositional tasks.
Contribution
It provides a comparative analysis of recent diffusion and autoregressive T2I models on compositional generation, highlighting the strengths of diffusion models like FLUX.
Findings
LlamaGen underperforms diffusion models in compositional tasks.
FLUX achieves comparable performance to DALL-E3.
Diffusion models show superior compositional capabilities.
Abstract
Text-to-image (T2I) generative models, such as Stable Diffusion and DALL-E, have shown remarkable proficiency in producing high-quality, realistic, and natural images from textual descriptions. However, these models sometimes fail to accurately capture all the details specified in the input prompts, particularly concerning entities, attributes, and spatial relationships. This issue becomes more pronounced when the prompt contains novel or complex compositions, leading to what are known as compositional generation failure modes. Recently, a new open-source diffusion-based T2I model, FLUX, has been introduced, demonstrating strong performance in high-quality image generation. Additionally, autoregressive T2I models like LlamaGen have claimed competitive visual quality performance compared to diffusion-based models. In this study, we evaluate the compositional generation capabilities of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling
MethodsDiffusion
