On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane, Askari-Hemmat, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek, Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

TL;DR
This paper systematically studies training recipes for latent diffusion models, introduces a new conditioning mechanism, and achieves state-of-the-art results in class-conditional and text-to-image generation.
Contribution
It provides an in-depth analysis of conditioning and transfer strategies in LDM training and proposes a novel conditioning method that improves generation quality.
Findings
Disentangling semantic and control conditioning improves model performance.
Transfer learning from smaller datasets enhances training efficiency.
New conditioning mechanism achieves state-of-the-art FID scores on multiple datasets.
Abstract
Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, the key components of the best performing LDM training recipes are oftentimes not available to the research community, preventing apple-to-apple comparisons and hindering the validation of progress in the field. In this work, we perform an in-depth study of LDM training recipes focusing on the performance of models and their training efficiency. To ensure apple-to-apple comparisons, we re-implement five previously published models with their corresponding recipes. Through our study, we explore the effects of (i)~the mechanisms used to condition the generative model on semantic information (e.g., text prompt) and control metadata (e.g., crop size, random flip flag, etc.) on the model performance, and (ii)~the transfer of the representations learned on smaller and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFLIP · Diffusion
