MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

Haohang Xu; Longyu Chen; Yichen Zhang; Shuangrui Ding; Zhipeng Zhang

arXiv:2501.13349·cs.CV·July 1, 2025

MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

Haohang Xu, Longyu Chen, Yichen Zhang, Shuangrui Ding, Zhipeng Zhang

PDF

Open Access

TL;DR

This paper introduces MSF, a multi-scale latent factorization framework for diffusion models that decomposes images into base and residual signals, enabling faster high-resolution image generation with improved quality.

Contribution

The paper proposes a novel multi-scale latent factorization approach that decomposes signals into base and residual components, reducing sampling steps and improving efficiency in diffusion models.

Findings

01

Achieves state-of-the-art FID scores on ImageNet benchmarks.

02

Provides a 4x speed-up in sampling with comparable or better quality.

03

Outperforms baseline models like DiT in both quality and efficiency.

Abstract

While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy inputs. This contrasts with signal processing techniques, such as Fourier and wavelet analyses, which often employ hierarchical decompositions. Inspired by such principles, particularly the idea of signal separation, we introduce a diffusion framework leveraging multi-scale latent factorization. Our framework uniquely decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal capturing core structural information and a high-frequency residual signal that contributes finer, high-frequency details like textures. This decomposition into base and residual components directly informs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Balanced Selection