Generative View Stitching
Chonghyuk Song, Michal Stary, Boyuan Chen, George Kopanas, Vincent Sitzmann

TL;DR
Generative View Stitching (GVS) introduces a parallel sampling algorithm for camera-guided video generation, ensuring scene fidelity, temporal consistency, and long-range coherence across predefined camera trajectories.
Contribution
GVS extends diffusion stitching techniques to video generation, compatible with existing models, and introduces Omni Guidance for enhanced temporal consistency and loop closure.
Findings
GVS produces stable, collision-free, and consistent videos along predefined paths.
Omni Guidance improves temporal coherence and enables loop-closing in generated videos.
Results include complex camera trajectories like the Impossible Staircase.
Abstract
Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. Our main contribution is a sampling algorithm that extends prior work on diffusion stitching for robot planning to video generation. While such stitching methods usually require a specially trained model, GVS is compatible with any off-the-shelf video model trained with Diffusion Forcing, a prevalent sequence diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
