OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images
Ye Mao, Junpeng Jing, Krystian Mikolajczyk

TL;DR
OpenDlign introduces depth-aligned images generated by a diffusion model to improve open-world 3D understanding, achieving superior zero-shot and few-shot performance by leveraging richer textures and refined alignment techniques.
Contribution
The paper presents a novel method using diffusion-generated depth-aligned images for robust multimodal 3D representation learning with minimal fine-tuning.
Findings
Outperforms previous models by 8.0% on ModelNet40 in zero-shot classification.
Achieves 16.4% improvement on OmniObject3D in zero-shot tasks.
Depth-aligned images enhance the performance of other state-of-the-art models.
Abstract
Recent open-world 3D representation learning methods using Vision-Language Models (VLMs) to align 3D point cloud with image-text information have shown superior 3D zero-shot performance. However, CAD-rendered images for this alignment often lack realism and texture variation, compromising alignment robustness. Moreover, the volume discrepancy between 3D and 2D pretraining datasets highlights the need for effective strategies to transfer the representational abilities of VLMs to 3D learning. In this paper, we present OpenDlign, a novel open-world 3D model using depth-aligned images generated from a diffusion model for robust multimodal alignment. These images exhibit greater texture diversity than CAD renderings due to the stochastic nature of the diffusion model. By refining the depth map projection pipeline and designing depth-specific prompts, OpenDlign leverages rich knowledge in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion · ALIGN
