LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

Chen Zhao; Jiawei Chen; Hongyu Li; Zhuoliang Kang; Shilin Lu; Xiaoming Wei; Kai Zhang; Jian Yang; Ying Tai

arXiv:2602.11564·cs.CV·February 13, 2026

LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

Chen Zhao, Jiawei Chen, Hongyu Li, Zhuoliang Kang, Shilin Lu, Xiaoming Wei, Kai Zhang, Jian Yang, Ying Tai

PDF

Open Access

TL;DR

LUVE is a novel framework for ultra-high-resolution video generation that uses a three-stage latent-cascaded architecture with dual frequency experts to improve realism and detail.

Contribution

The paper introduces LUVE, a three-stage latent-cascaded UHR video generation framework utilizing dual frequency experts for enhanced semantic coherence and fine details.

Findings

01

Achieves superior photorealism in UHR videos

02

Effective resolution upsampling in latent space reduces computational costs

03

Component ablations confirm each part's contribution to quality

Abstract

Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose \textbf{LUVE}, a \textbf{L}atent-cascaded \textbf{U}HR \textbf{V}ideo generation framework built upon dual frequency \textbf{E}xperts. LUVE employs a three-stage architecture comprising low-resolution motion generation for motion-consistent latent synthesis, video latent upsampling that performs resolution upsampling directly in the latent space to mitigate memory and computational overhead, and high-resolution content refinement that integrates low-frequency and high-frequency experts to jointly enhance semantic coherence and fine-grained detail generation. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging