InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Shuai Yuan; Yantai Yang; Xiaotian Yang; Xupeng Zhang; Zhonghao Zhao; Lingming Zhang; Zhipeng Zhang

arXiv:2601.02281·cs.CV·January 6, 2026

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Shuai Yuan, Yantai Yang, Xiaotian Yang, Xupeng Zhang, Zhonghao Zhao, Lingming Zhang, Zhipeng Zhang

PDF

Open Access

TL;DR

InfiniteVGGT introduces a novel causal transformer with a rolling memory mechanism and pruning strategy, enabling truly infinite-horizon streaming of 3D visual geometry with improved long-term stability and performance.

Contribution

The paper presents InfiniteVGGT, a causal transformer with an adaptive KV cache and pruning, supporting indefinite streaming and long-term 3D geometry understanding.

Findings

01

Outperforms existing streaming methods in long-term stability

02

Supports truly infinite-horizon 3D geometry streaming

03

Introduces the Long3D benchmark for extended sequence evaluation

Abstract

The grand vision of enabling persistent, large-scale 3D visual geometry understanding is shackled by the irreconcilable demands of scalability and long-term stability. While offline models like VGGT achieve inspiring geometry capability, their batch-based nature renders them irrelevant for live systems. Streaming architectures, though the intended solution for live operation, have proven inadequate. Existing methods either fail to support truly infinite-horizon inputs or suffer from catastrophic drift over long sequences. We shatter this long-standing dilemma with InfiniteVGGT, a causal visual geometry transformer that operationalizes the concept of a rolling memory through a bounded yet adaptive and perpetually expressive KV cache. Capitalizing on this, we devise a training-free, attention-agnostic pruning strategy that intelligently discards obsolete information, effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Advanced Vision and Imaging