Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors

Ying Zang; Yidong Han; Chaotao Ding; Yuanqi Hu; Deyi Ji; Qi Zhu; Xuanfu Li; Jin Ma; Lingyun Sun; Tianrun Chen; Lanyun Zhu

arXiv:2604.09366·cs.CV·April 13, 2026

Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors

Ying Zang, Yidong Han, Chaotao Ding, Yuanqi Hu, Deyi Ji, Qi Zhu, Xuanfu Li, Jin Ma, Lingyun Sun, Tianrun Chen, Lanyun Zhu

PDF

TL;DR

This paper introduces a novel framework for dynamic 4D scene reconstruction that effectively disentangles motion and static components using uncertainty-aware priors and attention mechanisms, outperforming existing methods.

Contribution

The proposed approach combines entropy-guided subspace projection, local geometry purification, and uncertainty-aware cross-view consistency to improve dynamic scene reconstruction without scene-specific tuning.

Findings

01

Reduces Mean Accuracy error by 13.43% on benchmarks.

02

Improves segmentation F-measure by 10.49%.

03

Outperforms state-of-the-art methods in dynamic 4D reconstruction.

Abstract

Reconstructing dynamic 4D scenes is an important yet challenging task. While 3D foundation models like VGGT excel in static settings, they often struggle with dynamic sequences where motion causes significant geometric ambiguity. To address this, we present a framework designed to disentangle dynamic and static components by modeling uncertainty across different stages of the reconstruction process. Our approach introduces three synergistic mechanisms: (1) Entropy-Guided Subspace Projection, which leverages information-theoretic weighting to adaptively aggregate multi-head attention distributions, effectively isolating dynamic motion cues from semantic noise; (2) Local-Consistency Driven Geometry Purification, which enforces spatial continuity via radius-based neighborhood constraints to eliminate structural outliers; and (3) Uncertainty-Aware Cross-View Consistency, which formulates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.