Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

Bin Yang; Mohamed Abdelsamad; Miao Zhang; Alexandru Paul Condurache

arXiv:2603.25165·cs.CV·April 1, 2026

Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

Bin Yang, Mohamed Abdelsamad, Miao Zhang, Alexandru Paul Condurache

PDF

TL;DR

This paper introduces PointINS, a self-supervised learning framework for point clouds that enhances instance awareness through geometry-aware regularization, improving 3D scene understanding and localization.

Contribution

It proposes a novel instance-oriented SSL method with geometry-aware regularization strategies, advancing 3D foundation models for various downstream tasks.

Findings

01

+3.5% mAP for indoor instance segmentation

02

+4.1% PQ for outdoor panoptic segmentation

03

Improved transferability of 3D representations

Abstract

Recent advances in self-supervised learning (SSL) for point clouds have substantially improved 3D scene understanding without human annotations. Existing approaches emphasize semantic awareness by enforcing feature consistency across augmented views or by masked scene modeling. However, the resulting representations transfer poorly to instance localization, and often require full finetuning for strong performance. Instance awareness is a fundamental component of 3D perception, thus bridging this gap is crucial for progressing toward true 3D foundation models that support all downstream tasks on 3D data. In this work, we introduce PointINS, an instance-oriented self-supervised framework that enriches point cloud representations through geometry-aware learning. PointINS employs an orthogonal offset branch to jointly learn high-level semantic understanding and geometric reasoning, yielding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.