BulletGen: Improving 4D Reconstruction with Bullet-Time Generation
Denis Rozumny, Jonathon Luiten, Numair Khan, Johannes Sch\"onberger, Peter Kontschieder

TL;DR
BulletGen leverages generative models to enhance 4D scene reconstruction from monocular videos, improving novel-view synthesis and tracking by correcting errors and filling missing data.
Contribution
It introduces a novel method that aligns diffusion-based video generation with 4D reconstruction, enabling correction and completion of scene data.
Findings
Achieves state-of-the-art results on novel-view synthesis.
Improves 2D and 3D tracking accuracy.
Effectively combines generative content with scene reconstruction.
Abstract
Transforming casually captured, monocular videos into fully immersive dynamic experiences is a highly ill-posed task, and comes with significant challenges, e.g., reconstructing unseen regions, and dealing with the ambiguity in monocular depth estimation. In this work we introduce BulletGen, an approach that takes advantage of generative models to correct errors and complete missing information in a Gaussian-based dynamic scene representation. This is done by aligning the output of a diffusion-based video generation model with the 4D reconstruction at a single frozen "bullet-time" step. The generated frames are then used to supervise the optimization of the 4D Gaussian model. Our method seamlessly blends generative content with both static and dynamic scene components, achieving state-of-the-art results on both novel-view synthesis, and 2D/3D tracking tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
