GFlow: Recovering 4D World from Monocular Video

Shizun Wang; Xingyi Yang; Qiuhong Shen; Zhenxiang Jiang; Xinchao Wang

arXiv:2405.18426·cs.CV·January 3, 2025

GFlow: Recovering 4D World from Monocular Video

Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, Xinchao Wang

PDF

Open Access

TL;DR

GFlow is a novel framework that reconstructs 4D dynamic scenes from monocular videos without prior camera information, enabling scene understanding, object tracking, and view synthesis using only 2D priors.

Contribution

GFlow introduces a new method for 4D scene recovery from monocular videos without camera parameters, utilizing 2D priors and Gaussian flow modeling for dynamic scene reconstruction.

Findings

01

Successfully recovers 4D scenes from monocular videos.

02

Enables object tracking and scene editing.

03

Achieves accurate camera pose estimation.

Abstract

Recovering 4D world from monocular video is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints and tackle a highly ambitious but practical task: With only one monocular video without camera parameters, we aim to recover the dynamic 3D world alongside the camera poses. To solve this, we introduce GFlow, a new framework that utilizes only 2D priors (depth and optical flow) to lift a video to a 4D scene, as a flow of 3D Gaussians through space and time. GFlow starts by segmenting the video into still and moving parts, then alternates between optimizing camera poses and the dynamics of the 3D Gaussian points. This method ensures consistency among adjacent points and smooth transitions between frames. Since dynamic scenes always continually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology