Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D   Understanding, Generation, and Instruction Following

Ziyu Guo; Renrui Zhang; Xiangyang Zhu; Yiwen Tang; Xianzheng Ma,; Jiaming Han; Kexin Chen; Peng Gao; Xianzhi Li; Hongsheng Li; Pheng-Ann Heng

arXiv:2309.00615·cs.CV·September 4, 2023·5 cites

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma,, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

PDF

Open Access 5 Repos 1 Models

TL;DR

This paper introduces Point-Bind, a multi-modal 3D model aligning point clouds with various modalities, and Point-LLM, a 3D large language model capable of understanding and following multi-modal instructions, advancing 3D understanding and generation.

Contribution

The work presents the first multi-modal 3D point cloud model and a 3D large language model that leverages multi-modal alignment without requiring 3D instruction data.

Findings

01

Point-Bind effectively aligns 3D data with multiple modalities.

02

Point-LLM demonstrates superior 3D question-answering capabilities.

03

The models enable new applications like any-to-3D generation and 3D embedding arithmetic.

Abstract

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large language model (LLM) following 3D multi-modal instructions. By parameter-efficient fine-tuning techniques, Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi-modal question-answering capacity. We hope our work may cast a light on the community for extending 3D point clouds to multi-modality applications. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
YuanTang96/GreenPLM
model· 10 dl· ♡ 1
10 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning