Learning Diffusion Models with Flexible Representation Guidance

Chenyu Wang; Cai Zhou; Sharut Gupta; Zongyu Lin; Stefanie Jegelka; Stephen Bates; Tommi Jaakkola

arXiv:2507.08980·cs.LG·October 14, 2025

Learning Diffusion Models with Flexible Representation Guidance

Chenyu Wang, Cai Zhou, Sharut Gupta, Zongyu Lin, Stefanie Jegelka, Stephen Bates, Tommi Jaakkola

PDF

TL;DR

This paper introduces a systematic framework for incorporating flexible representation guidance into diffusion models, leading to improved generation quality and significantly faster training across various domains.

Contribution

It presents a new theoretical framework and two novel strategies for enhancing representation alignment in diffusion models, with demonstrated empirical benefits.

Findings

01

Faster training on ImageNet with 23.3x speedup

02

Improved generation quality across multiple tasks

03

State-of-the-art performance with new guidance methods

Abstract

Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of the diffusion model with those of pre-trained models improves generation quality. In this paper, we present a systematic framework for incorporating representation guidance into diffusion models. We provide alternative decompositions of denoising models along with their associated training criteria, where the decompositions determine when and how the auxiliary representations are incorporated. Guided by our theoretical insights, we introduce two new strategies for enhancing representation alignment in diffusion models. First, we pair examples with target representations either derived from themselves or arisen from different synthetic modalities, and subsequently learn a joint model over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.