On Improved Conditioning Mechanisms and Pre-training Strategies for   Diffusion Models

Tariq Berrada Ifriqi; Pietro Astolfi; Melissa Hall; Reyhane; Askari-Hemmat; Yohann Benchetrit; Marton Havasi; Matthew Muckley; Karteek; Alahari; Adriana Romero-Soriano; Jakob Verbeek; Michal Drozdzal

arXiv:2411.03177·cs.CV·January 22, 2025

On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane, Askari-Hemmat, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek, Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

PDF

TL;DR

This paper systematically studies training recipes for latent diffusion models, introduces a new conditioning mechanism, and achieves state-of-the-art results in class-conditional and text-to-image generation.

Contribution

It provides an in-depth analysis of conditioning and transfer strategies in LDM training and proposes a novel conditioning method that improves generation quality.

Findings

01

Disentangling semantic and control conditioning improves model performance.

02

Transfer learning from smaller datasets enhances training efficiency.

03

New conditioning mechanism achieves state-of-the-art FID scores on multiple datasets.

Abstract

Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, the key components of the best performing LDM training recipes are oftentimes not available to the research community, preventing apple-to-apple comparisons and hindering the validation of progress in the field. In this work, we perform an in-depth study of LDM training recipes focusing on the performance of models and their training efficiency. To ensure apple-to-apple comparisons, we re-implement five previously published models with their corresponding recipes. Through our study, we explore the effects of (i)~the mechanisms used to condition the generative model on semantic information (e.g., text prompt) and control metadata (e.g., crop size, random flip flag, etc.) on the model performance, and (ii)~the transfer of the representations learned on smaller and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFLIP · Diffusion