Idea23D: Collaborative LMM Agents Enable 3D Model Generation from   Interleaved Multimodal Inputs

Junhao Chen; Xiang Li; Xiaojun Ye; Chao Li; Zhaoxin Fan; and Hao Zhao

arXiv:2404.04363·cs.CV·December 19, 2024·1 cites

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

Junhao Chen, Xiang Li, Xiaojun Ye, Chao Li, Zhaoxin Fan, and Hao Zhao

PDF

Open Access 1 Repo 1 Models

TL;DR

Idea23D introduces a novel collaborative framework using large multimodal models to generate 3D content from complex multimodal inputs, significantly advancing 3D AIGC capabilities.

Contribution

This work presents the first exploration of 3D content generation from multimodal IDEAs using a multi-agent LMM-based system, introducing a new framework and dataset.

Findings

01

Outperforms previous 3D AIGC methods in success rate and accuracy

02

Demonstrates effective collaboration among LMM agents for 3D generation

03

Achieves high compatibility with various models and inputs

Abstract

With the success of 2D diffusion models, 2D AIGC content has already transformed our lives. Recently, this success has been extended to 3D AIGC, with state-of-the-art methods generating textured 3D models from single images or text. However, we argue that current 3D AIGC methods still do not fully unleash human creativity. We often imagine 3D content made from multimodal inputs, such as what it would look like if my pet bunny were eating a doughnut on the table. In this paper, we explore a novel 3D AIGC approach: generating 3D content from IDEAs. An IDEA is a multimodal input composed of text, image, and 3D models. To our knowledge, this challenging and exciting 3D AIGC setting has not been studied before. We propose the new framework Idea23D, which combines three agents based on large multimodal models (LMMs) and existing algorithmic tools. These three LMM-based agents are tasked with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yisuanwang/idea23d
pytorchOfficial

Models

🤗
yisuanwang/Idea23D
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Speech and dialogue systems · Robotics and Automated Systems

MethodsDiffusion · ALIGN