GGPT: Geometry Grounded Point Transformer

Yutong Chen; Yiming Wang; Xucong Zhang; Sergey Prokudin; Siyu Tang

arXiv:2603.11174·cs.CV·March 13, 2026

GGPT: Geometry Grounded Point Transformer

Yutong Chen, Yiming Wang, Xucong Zhang, Sergey Prokudin, Siyu Tang

PDF

Open Access

TL;DR

GGPT introduces a geometry-grounded point transformer that combines geometric priors with dense feed-forward predictions to improve 3D reconstruction accuracy, consistency, and detail recovery from sparse RGB views.

Contribution

The paper presents a novel framework integrating geometric guidance with a point transformer for enhanced 3D reconstruction, including an improved SfM pipeline and explicit partial-geometry supervision.

Findings

01

Outperforms state-of-the-art models in 3D reconstruction accuracy.

02

Produces geometrically consistent and spatially complete reconstructions.

03

Generalizes well across different datasets and architectures.

Abstract

Recent feed-forward networks have achieved remarkable progress in sparse-view 3D reconstruction by predicting dense point maps directly from RGB images. However, they often suffer from geometric inconsistencies and limited fine-grained accuracy due to the absence of explicit multi-view constraints. We introduce the Geometry-Grounded Point Transformer (GGPT), a framework that augments feed-forward reconstruction with reliable sparse geometric guidance. We first propose an improved Structure-from-Motion pipeline based on dense feature matching and lightweight geometric optimisation to efficiently estimate accurate camera poses and partial 3D point clouds from sparse input views. Building on this foundation, we propose a geometry-guided 3D point transformer that refines dense point maps under explicit partial-geometry supervision using an optimised guidance encoding. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging