VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation

Yulu Gao; Bohao Zhang; Zongheng Tang; Jitong Liao; Wenjun Wu; Si Liu

arXiv:2604.13596·cs.CV·April 17, 2026

VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation

Yulu Gao, Bohao Zhang, Zongheng Tang, Jitong Liao, Wenjun Wu, Si Liu

PDF

1 Models

TL;DR

VGGT-Segmentor is a novel framework that combines geometric modeling with pixel-accurate segmentation for cross-view object segmentation, achieving state-of-the-art results without paired annotations.

Contribution

It introduces a new segmentation head and a self-supervised training strategy, improving dense prediction accuracy in cross-view scenarios.

Findings

01

Achieves 67.7% and 68.0% average IoU on Ego-Exo4D benchmark.

02

Outperforms prior methods in cross-view segmentation tasks.

03

Pretrained model surpasses many fully-supervised baselines.

Abstract

Instance-level object segmentation across disparate egocentric and exocentric views is a fundamental challenge in visual understanding, critical for applications in embodied AI and remote collaboration. This task is exceptionally difficult due to severe changes in scale, perspective, and occlusion, which destabilize direct pixel-level matching. While recent geometry-aware models like VGGT provide a strong foundation for feature alignment, we find they often fail at dense prediction tasks due to significant pixel-level projection drift, even when their internal object-level attention remains consistent. To bridge this gap, we introduce VGGT-Segmentor (VGGT-S), a framework that unifies robust geometric modeling with pixel-accurate semantic segmentation. VGGT-S leverages VGGT's powerful cross-view feature representation and introduces a novel Union Segmentation Head. This head operates in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zbbhhh/VGGT-S
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.