OpenShape: Scaling Up 3D Shape Representation Towards Open-World   Understanding

Minghua Liu; Ruoxi Shi; Kaiming Kuang; Yinhao Zhu; Xuanlin Li,; Shizhong Han; Hong Cai; Fatih Porikli; Hao Su

arXiv:2305.10764·cs.CV·June 21, 2023·39 cites

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li,, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

PDF

Open Access 1 Repo 1 Video

TL;DR

OpenShape is a scalable multi-modal framework that learns joint text, image, and 3D shape representations, enabling open-world 3D recognition and interactions with state-of-the-art zero-shot performance.

Contribution

It introduces a scalable training approach with data filtering, network scaling, and a novel hard negative mining module for improved 3D shape understanding.

Findings

01

Achieves 46.8% zero-shot accuracy on Objaverse-LVIS benchmark

02

Outperforms previous methods with 85.3% accuracy on ModelNet40

03

Encodes diverse visual and semantic concepts for fine-grained interactions

Abstract

We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up 3D representations to enable open-world 3D shape understanding. To achieve this, we scale up training data by ensembling multiple 3D datasets and propose several strategies to automatically filter and enrich noisy text descriptions. We also explore and compare strategies for scaling 3D backbone networks and introduce a novel hard negative mining module for more efficient training. We evaluate OpenShape on zero-shot 3D classification benchmarks and demonstrate its superior capabilities for open-world recognition. Specifically, OpenShape achieves a zero-shot accuracy of 46.8% on the 1,156-category Objaverse-LVIS benchmark, compared to less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Colin97/OpenShape_code
pytorch

Videos

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning · Contrastive Language-Image Pre-training