Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong; Xingye Tian; Xuebo Wang; Boyuan Jiang; Xin Tao; Pengfei Wan

arXiv:2511.16117·cs.CV·November 21, 2025

Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong, Xingye Tian, Xuebo Wang, Boyuan Jiang, Xin Tao, Pengfei Wan

PDF

Open Access

TL;DR

DCS-LDM introduces a hierarchical, scale-independent latent space for visual generation, decoupling content complexity from scale to enable flexible, high-quality image and video synthesis at various resolutions and frame rates.

Contribution

It proposes a novel hierarchical latent space that models content complexity independently of scale, allowing flexible resolution and frame rate generation in latent diffusion models.

Findings

01

Achieves performance comparable to state-of-the-art methods.

02

Supports arbitrary resolutions and frame rates within a fixed latent space.

03

Enables progressive coarse-to-fine generation.

Abstract

Existing latent diffusion models typically couple scale with content complexity, using more latent tokens to represent higher-resolution images or higher-frame rate videos. However, the latent capacity required to represent visual data primarily depends on content complexity, with scale serving only as an upper bound. Motivated by this observation, we propose DCS-LDM, a novel paradigm for visual generation that decouples information complexity from scale. DCS-LDM constructs a hierarchical, scale-independent latent space that models sample complexity through multi-level tokens and supports decoding to arbitrary resolutions and frame rates within a fixed latent representation. This latent space enables DCS-LDM to achieve a flexible computation-quality tradeoff. Furthermore, by decomposing structural and detailed information across levels, DCS-LDM supports a progressive coarse-to-fine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Image and Video Quality Assessment