PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation

Kaichen Zhou; Yuhan Wang; Grace Chen; Xinhai Chang; Gaspard Beaudouin; Fangneng Zhan; Paul Pu Liang; Mengyu Wang

arXiv:2510.17568·cs.CV·May 15, 2026

PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation

Kaichen Zhou, Yuhan Wang, Grace Chen, Xinhai Chang, Gaspard Beaudouin, Fangneng Zhan, Paul Pu Liang, Mengyu Wang

PDF

1 Repo

TL;DR

PAGE-4D extends the VGGT model to dynamic scenes, enabling simultaneous camera pose, depth, and point cloud estimation without post-processing by disentangling static and dynamic information.

Contribution

It introduces a dynamics-aware aggregator that improves 4D perception by disentangling static and dynamic scene components in a feedforward model.

Findings

01

Outperforms VGGT in dynamic scenarios for pose and depth estimation

02

Achieves accurate 4D reconstruction without post-processing

03

Demonstrates robustness in complex real-world scenes

Abstract

Recent 3D feed-forward models, such as the Visual Geometry Grounded Transformer (VGGT), have shown strong capability in inferring 3D attributes of static scenes. However, since they are typically trained on static datasets, these models often struggle in real-world scenarios involving complex dynamic elements, such as moving humans or deformable objects like umbrellas. To address this limitation, we introduce PAGE-4D, a feedforward model that extends VGGT to dynamic scenes, enabling camera pose estimation, depth prediction and point cloud reconstruction - all without post-processing. A central challenge in multitask 4D reconstruction is the inherent conflict between tasks: accurate camera pose estimation requires suppressing dynamic regions, while geometry reconstruction requires modeling them. To resolve this tension, we propose a dynamics aware aggregator that disentangles static and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://page4d.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.