Enhancing Generalizability of Representation Learning for Data-Efficient   3D Scene Understanding

Yunsong Wang; Na Zhao; Gim Hee Lee

arXiv:2406.11283·cs.CV·June 18, 2024

Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding

Yunsong Wang, Na Zhao, Gim Hee Lee

PDF

Open Access

TL;DR

This paper introduces a novel generative Bayesian network for creating diverse synthetic 3D scenes to improve self-supervised representation learning, enhancing transferability to real-world 3D scene understanding tasks.

Contribution

It proposes a new synthetic data generation method combined with joint contrastive and occlusion-aware reconstruction learning for better 3D scene representations.

Findings

01

Outperforms existing pre-training methods on 3D detection tasks

02

Achieves superior results on 3D semantic segmentation benchmarks

03

Demonstrates effective transfer of synthetic pre-training to real-world data

Abstract

The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devise a generative Bayesian network to produce diverse synthetic scenes with real-world patterns, and conduct pre-training with a joint objective. By jointly learning a coarse-to-fine contrastive learning task and an occlusion-aware reconstruction task, the model is primed with transferable, geometry-informed representations. Post pre-training on synthetic data, the acquired knowledge of the model can be seamlessly transferred to two principal downstream tasks associated with 3D scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization

MethodsContrastive Learning