OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned   Images

Ye Mao; Junpeng Jing; Krystian Mikolajczyk

arXiv:2404.16538·cs.CV·October 1, 2024

OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images

Ye Mao, Junpeng Jing, Krystian Mikolajczyk

PDF

Open Access 1 Repo 1 Video

TL;DR

OpenDlign introduces depth-aligned images generated by a diffusion model to improve open-world 3D understanding, achieving superior zero-shot and few-shot performance by leveraging richer textures and refined alignment techniques.

Contribution

The paper presents a novel method using diffusion-generated depth-aligned images for robust multimodal 3D representation learning with minimal fine-tuning.

Findings

01

Outperforms previous models by 8.0% on ModelNet40 in zero-shot classification.

02

Achieves 16.4% improvement on OmniObject3D in zero-shot tasks.

03

Depth-aligned images enhance the performance of other state-of-the-art models.

Abstract

Recent open-world 3D representation learning methods using Vision-Language Models (VLMs) to align 3D point cloud with image-text information have shown superior 3D zero-shot performance. However, CAD-rendered images for this alignment often lack realism and texture variation, compromising alignment robustness. Moreover, the volume discrepancy between 3D and 2D pretraining datasets highlights the need for effective strategies to transfer the representational abilities of VLMs to 3D learning. In this paper, we present OpenDlign, a novel open-world 3D model using depth-aligned images generated from a diffusion model for robust multimodal alignment. These images exhibit greater texture diversity than CAD renderings due to the stochastic nature of the diffusion model. By refining the depth map projection pipeline and designing depth-specific prompts, OpenDlign leverages rich knowledge in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yebulabula/OpenDlign
pytorchOfficial

Videos

OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques

MethodsDiffusion · ALIGN