VP-LLM: Text-Driven 3D Volume Completion with Large Language Models   through Patchification

Jianmeng Liu; Yichen Liu; Yuyao Zhang; Zeyuan Meng; Yu-Wing Tai,; Chi-Keung Tang

arXiv:2406.05543·cs.CV·June 11, 2024

VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

Jianmeng Liu, Yichen Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai,, Chi-Keung Tang

PDF

Open Access

TL;DR

This paper introduces VP-LLM, a novel method that uses large language models to perform conditional 3D volume completion by encoding 3D patches and integrating textual instructions, achieving superior results over existing diffusion-based models.

Contribution

The paper presents a new approach that leverages large language models for 3D volume completion, enabling complex instruction understanding and single-pass completion through patchification.

Findings

01

Outperforms state-of-the-art diffusion-based 3D completion models.

02

Demonstrates strong ability of LLMs to interpret complex text instructions.

03

Effectively encodes 3D patches for semantic understanding.

Abstract

Recent conditional 3D completion works have mainly relied on CLIP or BERT to encode textual information, which cannot support complex instruction. Meanwhile, large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks. Inspired by the recent advancements of LLM, we present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D completion in a single-forward pass. To integrate a 3D model into the LLM tokenization configuration, the incomplete 3D object is first divided into small patches that can be encoded independently. These encoded patches are then fed into an LLM along with the text prompt, instructing the LLM to capture the relations between these patches as well as injecting semantic meanings into the 3D object. Our results demonstrate a strong ability of LLMs to interpret complex text instructions and understand 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Adam · Attention Dropout · Weight Decay · Linear Layer · Multi-Head Attention · Dropout