NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song

TL;DR
NitroFusion introduces a dynamic adversarial training framework with specialized discriminator pools and multi-scale assessment to achieve high-fidelity single-step diffusion, enabling flexible quality-speed trade-offs.
Contribution
The paper presents a novel dynamic adversarial approach with specialized discriminators and flexible deployment for improved single-step diffusion quality.
Findings
Outperforms existing single-step methods in quality metrics
Preserves fine details and global consistency effectively
Supports dynamic quality-speed trade-offs with 1-4 denoising steps
Abstract
We introduce NitroFusion, a fundamentally different approach to single-step diffusion that achieves high-quality generation through a dynamic adversarial framework. While one-step methods offer dramatic speed advantages, they typically suffer from quality degradation compared to their multi-step counterparts. Just as a panel of art critics provides comprehensive feedback by specializing in different aspects like composition, color, and technique, our approach maintains a large pool of specialized discriminator heads that collectively guide the generation process. Each discriminator group develops expertise in specific quality aspects at different noise levels, providing diverse feedback that enables high-fidelity one-step generation. Our framework combines: (i) a dynamic discriminator pool with specialized discriminator groups to improve generation quality, (ii) strategic refresh…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy Techniques in Biomedical and Chemical Research
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
