LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Zhengyi Wang; Jonathan Lorraine; Yikai Wang; Hang Su; Jun Zhu; Sanja; Fidler; Xiaohui Zeng

arXiv:2411.09595·cs.LG·November 15, 2024·3 cites

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja, Fidler, Xiaohui Zeng

PDF

Open Access 7 Models

TL;DR

LLaMA-Mesh enables large language models to generate and understand 3D meshes by representing mesh data as text, unifying 3D and language modalities with high-quality results.

Contribution

This work introduces a novel text-based representation for 3D meshes and fine-tunes LLMs to generate and interpret 3D meshes directly from text prompts.

Findings

01

Achieves mesh generation quality comparable to specialized models.

02

Enables LLMs to understand and produce 3D meshes in a unified framework.

03

Maintains strong text generation performance after fine-tuning.

Abstract

This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. To address this, we introduce LLaMA-Mesh, a novel approach that represents the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. We construct a supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate 3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs as required, and (3) understand and interpret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Human Motion and Animation