UniT: Unified Geometry Learning with Group Autoregressive Transformer

Haotian Wang; Yusong Huang; Zhaonian Kuang; Hongliang Lu; Xinhu Zheng; Meng Yang; Gang Hua

arXiv:2605.21131·cs.CV·May 21, 2026

UniT: Unified Geometry Learning with Group Autoregressive Transformer

Haotian Wang, Yusong Huang, Zhaonian Kuang, Hongliang Lu, Xinhu Zheng, Meng Yang, Gang Hua

PDF

1 Repo

TL;DR

UniT introduces a unified transformer-based model that integrates various geometry perception tasks, enabling online and offline 3D reconstruction with improved scale generalization and state-of-the-art results.

Contribution

The paper proposes UniT, a novel Group Autoregressive Transformer that unifies diverse geometry perception capabilities within a single framework, handling multiple modalities and scales.

Findings

01

Achieves state-of-the-art performance on ten benchmarks across seven tasks.

02

Effectively unifies online perception and offline reconstruction within one model.

03

Demonstrates improved metric-scale generalization across different scenes.

Abstract

Recent feed-forward models have significantly advanced geometry perception for inferring dense 3D structure from sensor observations. However, its essential capabilities remain fragmented across multiple incompatible paradigms, including online perception, offline reconstruction, multi-modal integration, long-horizon scalability, and metric-scale estimation. We present UniT, a unified model built upon a novel Group Autoregressive Transformer, which reformulates these seemingly disparate capabilities within a single framework. The key idea is to treat groups of sensor observations as the basic autoregressive units and predict the corresponding point maps in an anchor-free and scale-adaptive manner. More specifically, diverse view configurations in both online and offline settings are naturally unified within a single group autoregression process. By varying the group size, online mode…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wang-xjtu/UniT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.