TL;DR
Ground4D introduces a spatially-grounded, pose-free 4D reconstruction framework that improves off-road scene modeling by resolving temporal conflicts through localized conditioning and auxiliary geometric cues.
Contribution
It proposes voxel-grounded temporal Gaussian aggregation and surface normal cues to enhance 4D reconstruction quality in unstructured off-road scenes.
Findings
Outperforms existing feedforward methods in reconstruction quality.
Generalizes zero-shot to unseen off-road domains.
Demonstrates effectiveness on ORAD-3D and RELLIS-3D datasets.
Abstract
Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward framework for pose-free off-road reconstruction. The key idea is to resolve temporal conflicts through spatially localized conditioning. Specifically, we introduce voxel-grounded temporal Gaussian aggregation, which partitions the canonical Gaussian space into spatial voxels and performs query-conditioned temporal attention within each voxel. Intra-voxel softmax normalization ensures that temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
