MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language   Models using 2D Priors

Yuan Tang; Xu Han; Xianzhi Li; Qiao Yu; Yixue Hao; Long Hu; Min Chen

arXiv:2405.01413·cs.CV·May 3, 2024·1 cites

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

PDF

Open Access 1 Repo 1 Models

TL;DR

MiniGPT-3D introduces an efficient method for aligning 3D point clouds with large language models by leveraging 2D priors, achieving state-of-the-art results with significantly reduced training costs and parameters.

Contribution

The paper proposes a novel 4-stage training strategy, a mixture of query experts module, and parameter-efficient fine-tuning techniques to develop MiniGPT-3D, a cost-effective 3D-LLM with superior performance.

Findings

01

Achieves SOTA on 3D classification and captioning

02

Reduces training time to 27 hours on a single GPU

03

Gains 8.12 score increase over ShapeLLM-13B in object captioning

Abstract

Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, which hinders the development of 3D-LLMs. In this paper, we introduce MiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTA results while training for only 27 hours on one RTX 3090. Specifically, we propose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, which can leverage the similarity between 2D and 3D visual information. We introduce a novel four-stage training strategy for modality alignment in a cascaded way, and a mixture of query experts module to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tangyuan96/minigpt-3d
pytorchOfficial

Models

🤗
YuanTang96/GreenPLM
model· 10 dl· ♡ 1
10 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Advanced Neural Network Applications

MethodsAttention Is All You Need · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer