$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement
Yuzhu Wang, Xi Ye, Duo Su, Yangyang Xu, Jun Zhu

TL;DR
h-control introduces a training-free camera control method that refines latent video representations through a block-conditional Gibbs process, achieving superior quality and robustness without training.
Contribution
It proposes a novel sampler modification with pseudo-Gibbs refinement and patch-based convergence acceleration for training-free video generation.
Findings
Outperforms all training-free methods on RealEstate10K and DAVIS datasets.
Achieves the best FVD scores among all tested methods.
Demonstrates robust convergence and improved visual quality.
Abstract
Training-free camera control for pretrained flow-matching video generators is a partial-observation inverse problem: a depth-warped guidance video supplies noisy evidence on a subset of latent sites, which the sampler must reconcile with the pretrained prior. Existing methods struggle to balance the trade-off between trajectory adherence and visual quality and the heuristic guidance-strength tuning lacks robustness. We propose \textbf{-control}, which resolves this dilemma through a structural change to the sampler: each outer hard-replacement guidance step is augmented with an inner-loop \emph{block-conditional pseudo-Gibbs refinement} on the unobserved complement at the same noise level, with provable convergence to the partial-observation conditional data law. To accelerate convergence on high-dimensional video latents, we exploit their conditional locality, partitioning the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
