LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion

Zengqun Zhao; Ziquan Liu; Yu Cao; Shaogang Gong; Zhensong Zhang; Jifei Song; Jiankang Deng; Ioannis Patras

arXiv:2603.14526·cs.CV·March 17, 2026

LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion

Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Zhensong Zhang, Jifei Song, Jiankang Deng, Ioannis Patras

PDF

Open Access

TL;DR

LatSearch introduces a novel inference-time search method guided by a latent reward model, significantly enhancing video diffusion quality and efficiency by providing intermediate feedback and optimizing the denoising process.

Contribution

This work presents LatSearch, a new latent reward-guided search algorithm that improves inference-time scaling in video diffusion by reducing computational costs and increasing controllability.

Findings

01

Consistently improves video quality on VBench-2.0 benchmark.

02

Enhances controllability and sample efficiency in video diffusion.

03

Outperforms baseline Wan2.1 model across multiple metrics.

Abstract

The recent success of inference-time scaling in large language models has inspired similar explorations in video diffusion. In particular, motivated by the existence of "golden noise" that enhances video quality, prior work has attempted to improve inference by optimising or searching for better initial noise. However, these approaches have notable limitations: they either rely on priors imposed at the beginning of noise sampling or on rewards evaluated only on the denoised and decoded videos. This leads to error accumulation, delayed and sparse reward signals, and prohibitive computational cost, which prevents the use of stronger search algorithms. Crucially, stronger search algorithms are precisely what could unlock substantial gains in controllability, sample efficiency and generation quality for video diffusion, provided their computational cost can be reduced. To fill in this gap,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Speech and Audio Processing