Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Eric Hanchen Jiang, Yasi Zhang, Zhi Zhang, Yixin Wan, Andrew, Lizarraga, Shufan Li, Ying Nian Wu

TL;DR
This paper introduces a PAC-Bayesian framework to improve text-to-image diffusion models by designing attention priors, enhancing attribute-object alignment, and achieving state-of-the-art results on standard benchmarks.
Contribution
It presents a novel Bayesian approach that uses custom attention priors within diffusion models to improve alignment and generalization in text-to-image generation.
Findings
Achieves state-of-the-art performance on standard benchmarks.
Improves attribute-object alignment and modifier-noun binding.
Enhances image quality and model robustness.
Abstract
Text-to-image (T2I) diffusion models have revolutionized generative modeling by producing high-fidelity, diverse, and visually realistic images from textual prompts. Despite these advances, existing models struggle with complex prompts involving multiple objects and attributes, often misaligning modifiers with their corresponding nouns or neglecting certain elements. Recent attention-based methods have improved object inclusion and linguistic binding, but still face challenges such as attribute misbinding and a lack of robust generalization guarantees. Leveraging the PAC-Bayes framework, we propose a Bayesian approach that designs custom priors over attention distributions to enforce desirable properties, including divergence between objects, alignment between modifiers and their corresponding nouns, minimal attention to irrelevant tokens, and regularization for better generalization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
