SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for   Spatial-Aware Visual Representations

Zhenyu Li; Zehui Chen; Ang Li; Liangji Fang; Qinhong Jiang; Xianming; Liu; Junjun Jiang; Bolei Zhou; Hang Zhao

arXiv:2112.04680·cs.CV·January 19, 2022

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming, Liu, Junjun Jiang, Bolei Zhou, Hang Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

SimIPU introduces a novel unsupervised pre-training method that enhances 2D image representations with 3D spatial awareness using multi-modal contrastive learning, improving performance on 3D-related vision tasks.

Contribution

This work is the first to apply contrastive learning pre-training to outdoor multi-modal datasets combining images and LIDAR point clouds for spatial-aware visual representations.

Findings

01

Effective spatial-aware representations learned from point clouds.

02

Successful transfer of spatial perception to image encoders.

03

First contrastive pre-training approach for outdoor multi-modal data.

Abstract

Pre-training has become a standard paradigm in many computer vision tasks. However, most of the methods are generally designed on the RGB image domain. Due to the discrepancy between the two-dimensional image plane and the three-dimensional space, such pre-trained models fail to perceive spatial information and serve as sub-optimal solutions for 3D-related tasks. To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks. To leverage point clouds, which are much more superior in providing spatial information compared to images, we propose a simple yet effective 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module to learn a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhyever/simipu
pytorchOfficial

Videos

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations· underline

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging

MethodsContrastive Learning