What happens to diffusion model likelihood when your model is   conditional?

Mattias Cross; Anton Ragni

arXiv:2409.06364·cs.LG·September 27, 2024

What happens to diffusion model likelihood when your model is conditional?

Mattias Cross, Anton Ragni

PDF

Open Access

TL;DR

This paper investigates how likelihoods derived from diffusion models behave in conditional settings like TTI and TTS, revealing they are less sensitive to conditioning inputs than previously assumed.

Contribution

It uncovers the properties and limitations of diffusion model likelihoods in conditional tasks, highlighting their insensitivity to conditioning inputs.

Findings

01

TTS diffusion likelihoods are agnostic to text input.

02

TTI likelihoods are more expressive but cannot detect confounding prompts.

03

Conditional diffusion likelihoods are less sensitive than expected.

Abstract

Diffusion Models (DMs) iteratively denoise random samples to produce high-quality data. The iterative sampling process is derived from Stochastic Differential Equations (SDEs), allowing a speed-quality trade-off chosen at inference. Another advantage of sampling with differential equations is exact likelihood computation. These likelihoods have been used to rank unconditional DMs and for out-of-domain classification. Despite the many existing and possible uses of DM likelihoods, the distinct properties captured are unknown, especially in conditional contexts such as Text-To-Image (TTI) or Text-To-Speech synthesis (TTS). Surprisingly, we find that TTS DM likelihoods are agnostic to the text input. TTI likelihood is more expressive but cannot discern confounding prompts. Our results show that applying DMs to conditional tasks reveals inconsistencies and strengthens claims that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neuroimaging Techniques and Applications

MethodsDiffusion