DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

Xiaoxue Chen; Ziyi Xiong; Yuantao Chen; Gen Li; Nan Wang; Hongcheng Luo; Long Chen; Haiyang Sun; Bing Wang; Guang Chen; Hangjun Ye; Hongyang Li; Ya-Qin Zhang; Hao Zhao

arXiv:2512.03004·cs.CV·December 3, 2025

DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images

Xiaoxue Chen, Ziyi Xiong, Yuantao Chen, Gen Li, Nan Wang, Hongcheng Luo, Long Chen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Hongyang Li, Ya-Qin Zhang, Hao Zhao

PDF

Open Access

TL;DR

DGGT is a fast, pose-free 4D scene reconstruction method for dynamic driving scenes that predicts camera poses and 3D Gaussian maps directly from unposed images, enabling scalable and high-quality reconstructions.

Contribution

The paper introduces DGGT, a novel feedforward framework that jointly predicts camera poses and 3D scene representations from unposed images, eliminating the need for per-scene optimization or known calibration.

Findings

01

Outperforms prior methods on large-scale driving benchmarks.

02

Supports arbitrary number of views and long sequences.

03

Achieves state-of-the-art speed and quality in 4D reconstruction.

Abstract

Autonomous driving needs fast, scalable 4D reconstruction and re-simulation for training and evaluation, yet most methods for dynamic driving scenes still rely on per-scene optimization, known camera calibration, or short frame windows, making them slow and impractical. We revisit this problem from a feedforward perspective and introduce \textbf{Driving Gaussian Grounded Transformer (DGGT)}, a unified framework for pose-free dynamic scene reconstruction. We note that the existing formulations, treating camera pose as a required input, limit flexibility and scalability. Instead, we reformulate pose as an output of the model, enabling reconstruction directly from sparse, unposed images and supporting an arbitrary number of views for long sequences. Our approach jointly predicts per-frame 3D Gaussian maps and camera parameters, disentangles dynamics with a lightweight dynamic head, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Robotics and Sensor-Based Localization