Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?
Yifan Zhang, Junhui Hou

TL;DR
This paper introduces CMCR, a novel framework for 3D representation learning that effectively integrates modality-shared and modality-specific features using advanced tasks and a unified codebook, outperforming previous contrastive distillation methods.
Contribution
The paper proposes a comprehensive framework, CMCR, that enhances 3D representations by incorporating modality-specific features and a multi-modal codebook, addressing limitations of existing contrastive methods.
Findings
Outperforms existing image-to-LiDAR contrastive distillation methods.
Effectively integrates modality-shared and modality-specific features.
Improves downstream task performance.
Abstract
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations. However, existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process, which leads to suboptimal representations. In this paper, we theoretically analyze the limitations of current contrastive methods for 3D representation learning and propose a new framework, namely CMCR (Cross-Modal Comprehensive Representation Learning), to address these shortcomings. Our approach improves upon traditional methods by better integrating both modality-shared and modality-specific features. Specifically, we introduce masked image modeling and occupancy estimation tasks to guide the network in learning more comprehensive modality-specific features. Furthermore, we propose a novel multi-modal unified codebook that learns an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Manufacturing Process and Optimization
MethodsFocus
