Predicting 3D representations for Dynamic Scenes

Di Qi; Tong Yang; Beining Wang; Xiangyu Zhang; Wenqiang Zhang

arXiv:2501.16617·cs.CV·January 29, 2025

Predicting 3D representations for Dynamic Scenes

Di Qi, Tong Yang, Beining Wang, Xiangyu Zhang, Wenqiang Zhang

PDF

Open Access

TL;DR

This paper introduces a new framework for predicting explicit 3D dynamic scene representations from monocular videos, leveraging a triplane model and a 4D transformer for improved 4D scene understanding.

Contribution

It proposes a novel ego-centric unbounded triplane and a 4D-aware transformer for self-supervised dynamic scene modeling from monocular videos, advancing 3D scene prediction.

Findings

01

Achieves top results in dynamic radiance field prediction on NVIDIA scenes.

02

Demonstrates strong generalization to unseen scenarios.

03

Emerges capabilities for geometry and semantic learning.

Abstract

We present a novel framework for dynamic radiance field prediction given monocular video streams. Unlike previous methods that primarily focus on predicting future frames, our method goes a step further by generating explicit 3D representations of the dynamic scene. The framework builds on two core designs. First, we adopt an ego-centric unbounded triplane to explicitly represent the dynamic physical world. Second, we develop a 4D-aware transformer to aggregate features from monocular videos to update the triplane. Coupling these two designs enables us to train the proposed model with large-scale monocular videos in a self-supervised manner. Our model achieves top results in dynamic radiance field prediction on NVIDIA dynamic scenes, demonstrating its strong performance on 4D physical world modeling. Besides, our model shows a superior generalizability to unseen scenarios. Notably, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · 3D Surveying and Cultural Heritage

MethodsADaptive gradient method with the OPTimal convergence rate · Focus