GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss

Yangfan Xu; Lilian Zhang; Xiaofeng He; Pengdong Wu; Wenqi Wu; Jun Mao

arXiv:2601.16885·cs.CV·April 3, 2026

GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss

Yangfan Xu, Lilian Zhang, Xiaofeng He, Pengdong Wu, Wenqi Wu, Jun Mao

PDF

1 Repo

TL;DR

This paper introduces GPA-VGGT, a self-supervised learning framework for VGGT models that improves large-scale camera localization without requiring labeled data, by leveraging geometric and physical constraints.

Contribution

It extends VGGT with a self-supervised training method using sequence-wise geometric constraints and physical photometric consistency, enabling effective large-scale localization.

Findings

01

Model converges within hundreds of iterations.

02

Achieves significant improvements in large-scale localization.

03

Effectively captures multi-view geometry through joint optimization.

Abstract

Transformer-based general visual geometry frameworks have shown promising performance in camera pose estimation and 3D scene understanding. Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise in camera pose estimation and 3D reconstruction. However, these models typically rely on ground truth labels for training, posing challenges when adapting to unlabeled and unseen scenes. In this paper, we propose a self-supervised framework to train VGGT with unlabeled data, thereby enhancing its localization capability in large-scale environments. To achieve this, we extend conventional pair-wise relations to sequence-wise geometric constraints for self-supervised learning. Specifically, in each sequence, we sample multiple source frames and geometrically project them onto different target frames, which improves temporal feature consistency. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

X-yangfan/GPA-VGGT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.