P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding
Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao, Dong

TL;DR
P4Contrast introduces a novel contrastive learning method for RGB-D scene understanding by pairing point and pixel data, improving feature learning from multi-modal data and outperforming previous methods on benchmark datasets.
Contribution
The paper proposes a new contrastive learning approach using pairs of point-pixel pairs for RGB-D data, enhancing multi-modal feature learning and scene understanding.
Findings
Outperforms previous pretraining methods on ScanNet, SUN RGB-D, and 3RScan datasets.
Effectively learns features from both RGB and depth modalities.
Provides a flexible framework for hard negative sampling in multi-modal contrastive learning.
Abstract
Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks. A promising approach is to use contrastive learning to learn a latent space where features are close for similar data samples and far apart for dissimilar ones. This approach has demonstrated tremendous success for pretraining both image and point cloud feature extractors, but it has been barely investigated for multi-modal RGB-D scans, especially with the goal of facilitating high-level scene understanding. To solve this problem, we propose contrasting "pairs of point-pixel pairs", where positives include pairs of RGB-D points in correspondence, and negatives include pairs where one of the two modalities has been disturbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
MethodsContrastive Learning
