POMA-3D: The Point Map Way to 3D Scene Understanding

Ye Mao; Weixun Luo; Ranran Huang; Junpeng Jing; Krystian Mikolajczyk

arXiv:2511.16567·cs.CV·May 7, 2026

POMA-3D: The Point Map Way to 3D Scene Understanding

Ye Mao, Weixun Luo, Ranran Huang, Junpeng Jing, Krystian Mikolajczyk

PDF

1 Repo

TL;DR

POMA-3D introduces a self-supervised 3D representation model using point maps, effectively leveraging 2D priors for diverse 3D scene understanding tasks with geometric inputs.

Contribution

The paper presents POMA-3D, a novel point map-based 3D representation model with a view-to-scene alignment strategy and a new dataset for large-scale pretraining.

Findings

01

POMA-3D outperforms existing methods on 3D understanding tasks.

02

The model benefits tasks like question answering, navigation, and scene retrieval.

03

It demonstrates strong generalization with only geometric inputs.

Abstract

In this paper, we introduce POMA-3D, the first self-supervised 3D representation model learned from point maps. Point maps encode explicit 3D coordinates on a structured 2D grid, preserving global 3D geometry while remaining compatible with the input format of 2D foundation models. To transfer rich 2D priors into POMA-3D, a view-to-scene alignment strategy is designed. Moreover, as point maps are view-dependent with respect to a canonical space, we introduce POMA-JEPA, a joint embedding-predictive architecture that enforces geometrically consistent point map features across multiple views. Additionally, we introduce ScenePoint, a point map dataset constructed from 6.5K room-level RGB-D scenes and 1M 2D image scenes to facilitate large-scale POMA-3D pretraining. Experiments show that POMA-3D serves as a strong backbone for both specialist and generalist 3D understanding. It benefits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://matchlab-imperial.github.io/poma3d
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.