OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li,, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

TL;DR
OpenShape is a scalable multi-modal framework that learns joint text, image, and 3D shape representations, enabling open-world 3D recognition and interactions with state-of-the-art zero-shot performance.
Contribution
It introduces a scalable training approach with data filtering, network scaling, and a novel hard negative mining module for improved 3D shape understanding.
Findings
Achieves 46.8% zero-shot accuracy on Objaverse-LVIS benchmark
Outperforms previous methods with 85.3% accuracy on ModelNet40
Encodes diverse visual and semantic concepts for fine-grained interactions
Abstract
We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up 3D representations to enable open-world 3D shape understanding. To achieve this, we scale up training data by ensembling multiple 3D datasets and propose several strategies to automatically filter and enrich noisy text descriptions. We also explore and compare strategies for scaling 3D backbone networks and introduce a novel hard negative mining module for more efficient training. We evaluate OpenShape on zero-shot 3D classification benchmarks and demonstrate its superior capabilities for open-world recognition. Specifically, OpenShape achieves a zero-shot accuracy of 46.8% on the 1,156-category Objaverse-LVIS benchmark, compared to less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning · Contrastive Language-Image Pre-training
