Improving Controllable Generation: Faster Training and Better Performance via $x_0$-Supervision

Amadou S. Sangare; Adrien Maglo; Mohamed Chaouch; Bertrand Luvison

arXiv:2604.05761·cs.CV·April 8, 2026

Improving Controllable Generation: Faster Training and Better Performance via $x_0$-Supervision

Amadou S. Sangare, Adrien Maglo, Mohamed Chaouch, Bertrand Luvison

PDF

1 Repo

TL;DR

This paper introduces $x_0$-supervision for controllable diffusion models, significantly speeding up training and enhancing image quality and control accuracy in text-to-image generation.

Contribution

It presents a new training objective based on direct supervision of the clean image, leading to faster convergence and better performance in controllable diffusion models.

Findings

01

Accelerates convergence by up to 2× using $x_0$-supervision.

02

Improves visual quality and conditioning accuracy.

03

Introduces a novel metric, mean AUCC, for measuring convergence speed.

Abstract

Text-to-Image (T2I) diffusion/flow models have recently achieved remarkable progress in visual fidelity and text alignment. However, they remain limited when users need to precisely control image layouts, something that natural language alone cannot reliably express. Controllable generation methods augment the initial T2I model with additional conditions that more easily describe the scene. Prior works straightforwardly train the augmented network with the same loss as the initial network. Although natural at first glance, this can lead to very long training times in some cases before convergence. In this work, we revisit the training objective of controllable diffusion models through a detailed analysis of their denoising dynamics. We show that direct supervision on the clean target image, dubbed $x_{0}$ -supervision, or an equivalent re-weighting of the diffusion loss, yields faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CEA-LIST/x0-supervision
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.