LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion
Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Zhensong Zhang, Jifei Song, Jiankang Deng, Ioannis Patras

TL;DR
LatSearch introduces a novel inference-time search method guided by a latent reward model, significantly enhancing video diffusion quality and efficiency by providing intermediate feedback and optimizing the denoising process.
Contribution
This work presents LatSearch, a new latent reward-guided search algorithm that improves inference-time scaling in video diffusion by reducing computational costs and increasing controllability.
Findings
Consistently improves video quality on VBench-2.0 benchmark.
Enhances controllability and sample efficiency in video diffusion.
Outperforms baseline Wan2.1 model across multiple metrics.
Abstract
The recent success of inference-time scaling in large language models has inspired similar explorations in video diffusion. In particular, motivated by the existence of "golden noise" that enhances video quality, prior work has attempted to improve inference by optimising or searching for better initial noise. However, these approaches have notable limitations: they either rely on priors imposed at the beginning of noise sampling or on rewards evaluated only on the denoised and decoded videos. This leads to error accumulation, delayed and sparse reward signals, and prohibitive computational cost, which prevents the use of stronger search algorithms. Crucially, stronger search algorithms are precisely what could unlock substantial gains in controllability, sample efficiency and generation quality for video diffusion, provided their computational cost can be reduced. To fill in this gap,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Speech and Audio Processing
