Neural Residual Diffusion Models for Deep Scalable Vision Generation
Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

TL;DR
This paper introduces Neural Residual Diffusion Models, a scalable framework that enhances deep vision generation by incorporating learnable residual parameters, leading to state-of-the-art results in image and video synthesis.
Contribution
It proposes a novel neural residual architecture with gated residuals that align with diffusion dynamics, enabling scalable training and improved generative quality.
Findings
Achieves state-of-the-art scores on image and video benchmarks.
Demonstrates improved fidelity and consistency in generated content.
Supports large-scale scalable training of deep generative models.
Abstract
The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, which hinders massively deep scalable training of vision generation models. In this paper, we first uncover the nature that neural networks being able to effectively perform generative denoising lies in the fact that the intrinsic residual unit has consistent dynamic property with the input signal's reverse diffusion process, thus supporting excellent generative abilities. Afterwards, we stand on the shoulders of two common types of deep stacked networks to propose a unified and massively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net · Diffusion
