Geometric Context Transformer for Streaming 3D Reconstruction

Lin-Zhuo Chen; Jian Gao; Yihang Chen; Ka Leong Cheng; Yipengjing Sun; Liangxiao Hu; Nan Xue; Xing Zhu; Yujun Shen; Yao Yao; Yinghao Xu

arXiv:2604.14141·cs.CV·April 17, 2026

Geometric Context Transformer for Streaming 3D Reconstruction

Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun, Liangxiao Hu, Nan Xue, Xing Zhu, Yujun Shen, Yao Yao, Yinghao Xu

PDF

1 Repo 3 Models

TL;DR

LingBot-Map is a novel geometric context transformer model that enables real-time, accurate, and consistent streaming 3D scene reconstruction from video data, outperforming existing methods.

Contribution

The paper introduces LingBot-Map, a new feed-forward 3D foundation model with a specialized attention mechanism for efficient, stable streaming 3D reconstruction.

Findings

01

Achieves around 20 FPS on high-resolution inputs over long sequences.

02

Outperforms existing streaming and optimization-based approaches on various benchmarks.

03

Maintains rich geometric context with a compact streaming state.

Abstract

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robbyant/lingbot-map
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.