Towards Unified Representation of Multi-Modal Pre-training for 3D   Understanding via Differentiable Rendering

Ben Fei; Yixuan Li; Weidong Yang; Lipeng Ma; Ying He

arXiv:2404.13619·cs.MM·April 23, 2024

Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering

Ben Fei, Yixuan Li, Weidong Yang, Lipeng Ma, Ying He

PDF

Open Access

TL;DR

DR-Point introduces a tri-modal pre-training framework that leverages differentiable rendering to unify RGB images, depth images, and 3D point clouds, significantly improving 3D understanding tasks.

Contribution

It presents a novel differentiable rendering-based approach for tri-modal pre-training, enabling effective learning of a unified 3D representation space from limited triplets.

Findings

01

Outperforms existing self-supervised methods in various 3D tasks

02

Enhances point cloud reconstruction accuracy

03

Validates effectiveness through extensive ablation studies

Abstract

State-of-the-art 3D models, which excel in recognition tasks, typically depend on large-scale datasets and well-defined category sets. Recent advances in multi-modal pre-training have demonstrated potential in learning 3D representations by aligning features from 3D shapes with their 2D RGB or depth counterparts. However, these existing frameworks often rely solely on either RGB or depth images, limiting their effectiveness in harnessing a comprehensive range of multi-modal data for 3D applications. To tackle this challenge, we present DR-Point, a tri-modal pre-training framework that learns a unified representation of RGB images, depth images, and 3D point clouds by pre-training with object triplets garnered from each modality. To address the scarcity of such triplets, DR-Point employs differentiable rendering to obtain various depth images. This approach not only augments the supply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Neural Network Applications · 3D Shape Modeling and Analysis