MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize
Haohang Xu, Longyu Chen, Yichen Zhang, Shuangrui Ding, Zhipeng Zhang

TL;DR
This paper introduces MSF, a multi-scale latent factorization framework for diffusion models that decomposes images into base and residual signals, enabling faster high-resolution image generation with improved quality.
Contribution
The paper proposes a novel multi-scale latent factorization approach that decomposes signals into base and residual components, reducing sampling steps and improving efficiency in diffusion models.
Findings
Achieves state-of-the-art FID scores on ImageNet benchmarks.
Provides a 4x speed-up in sampling with comparable or better quality.
Outperforms baseline models like DiT in both quality and efficiency.
Abstract
While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy inputs. This contrasts with signal processing techniques, such as Fourier and wavelet analyses, which often employ hierarchical decompositions. Inspired by such principles, particularly the idea of signal separation, we introduce a diffusion framework leveraging multi-scale latent factorization. Our framework uniquely decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal capturing core structural information and a high-frequency residual signal that contributes finer, high-frequency details like textures. This decomposition into base and residual components directly informs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Balanced Selection
