DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving

Zhuolin He; Jing Li; Guanghao Li; Xiaolei Chen; Jiacheng Tang; Siyang Zhang; Zhounan Jin; Feipeng Cai; Bin Li; Jian Pu; Jia Cai; Xiangyang Xue

arXiv:2603.08254·cs.CV·March 10, 2026

DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving

Zhuolin He, Jing Li, Guanghao Li, Xiaolei Chen, Jiacheng Tang, Siyang Zhang, Zhounan Jin, Feipeng Cai, Bin Li, Jian Pu, Jia Cai, Xiangyang Xue

PDF

Open Access

TL;DR

DynamicVGGT introduces a novel framework for 4D dynamic scene reconstruction in autonomous driving, effectively modeling temporal point motion and scene dynamics through joint prediction, attention mechanisms, and Gaussian splatting.

Contribution

It extends static 3D perception models to dynamic 4D reconstruction by jointly predicting point maps, incorporating motion-aware attention, and explicitly modeling point velocities with Gaussian splatting.

Findings

01

Outperforms existing methods in reconstruction accuracy

02

Achieves robust 4D scene reconstruction in complex scenarios

03

Effectively models dynamic point motion and scene flow

Abstract

Dynamic scene reconstruction in autonomous driving remains a fundamental challenge due to significant temporal variations, moving objects, and complex scene dynamics. Existing feed-forward 3D models have demonstrated strong performance in static reconstruction but still struggle to capture dynamic motion. To address these limitations, we propose DynamicVGGT, a unified feed-forward framework that extends VGGT from static 3D perception to dynamic 4D reconstruction. Our goal is to model point motion within feed-forward 3D models in a dynamic and temporally coherent manner. To this end, we jointly predict the current and future point maps within a shared reference coordinate system, allowing the model to implicitly learn dynamic point representations through temporal correspondence. To efficiently capture temporal dependencies, we introduce a Motion-aware Temporal Attention (MTA) module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization