Utonia: Toward One Encoder for All Point Clouds

Yujia Zhang; Xiaoyang Wu; Yunhan Yang; Xianzhe Fan; Han Li; Yuechen Zhang; Zehao Huang; Naiyan Wang; Hengshuang Zhao

arXiv:2603.03283·cs.CV·March 4, 2026

Utonia: Toward One Encoder for All Point Clouds

Yujia Zhang, Xiaoyang Wu, Yunhan Yang, Xianzhe Fan, Han Li, Yuechen Zhang, Zehao Huang, Naiyan Wang, Hengshuang Zhao

PDF

Open Access 1 Models

TL;DR

Utonia introduces a unified self-supervised point transformer encoder trained across diverse 3D point cloud domains, enhancing perception and enabling benefits in robotics, multimodal reasoning, and spatial understanding.

Contribution

It is the first to train a single self-supervised encoder across multiple diverse point cloud domains, promoting a unified representation space.

Findings

01

Improves perception capabilities across domains.

02

Enhances robotic manipulation and multimodal reasoning.

03

Reveals emergent behaviors from joint training.

Abstract

We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across diverse domains, spanning remote sensing, outdoor LiDAR, indoor RGB-D sequences, object-centric CAD models, and point clouds lifted from RGB-only videos. Despite their distinct sensing geometries, densities, and priors, Utonia learns a consistent representation space that transfers across domains. This unification improves perception capability while revealing intriguing emergent behaviors that arise only when domains are trained jointly. Beyond perception, we observe that Utonia representations can also benefit embodied and multimodal reasoning: conditioning vision-language-action policies on Utonia features improves robotic manipulation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Pointcept/Utonia
model· ♡ 14
♡ 14

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Robot Manipulation and Learning