E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models
Zhiyu Tan, WenXu Qian, Hesen Chen, Mengping Yang, Lei Chen, Hao Li

TL;DR
E2ED^2 introduces an end-to-end training approach for diffusion models that directly maps noise to data, improving efficiency, stability, and integration of advanced loss functions, leading to better generative performance.
Contribution
The paper proposes a novel end-to-end differentiable diffusion framework that addresses training-inference mismatch, information leakage, and allows integration of perceptual and adversarial losses.
Findings
Achieves lower FID and higher CLIP scores on benchmarks
Requires fewer sampling steps for high-quality generation
Combines diffusion and GAN-like optimization benefits
Abstract
Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling, revolutionizing the field through remarkable success across various diverse applications ranging from high-quality image synthesis to temporal aware video generation. Despite these advancements, three fundamental limitations persist, including 1) discrepancy between training and inference processes, 2) progressive information leakage throughout the noise corruption procedures, and 3) inherent constraints preventing effective integration of modern optimization criteria like perceptual and adversarial loss. To mitigate these critical challenges, we in this paper present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises. Our proposed End-to-End Differentiable Diffusion, dubbed E2ED^2, introduces several key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering
MethodsDiffusion · Contrastive Language-Image Pre-training
