TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Xuying Zhang; Yutong Liu; Yangguang Li; Renrui Zhang; Yufei Liu; Kai Wang; Wanli Ouyang; Zhiwei Xiong; Peng Gao; Qibin Hou; Ming-Ming Cheng

arXiv:2412.16919·cs.CV·August 12, 2025

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Xuying Zhang, Yutong Liu, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou, Ming-Ming Cheng

PDF

Open Access

TL;DR

TAR3D introduces a novel framework combining a 3D-aware VQ-VAE and GPT to generate high-quality 3D assets by modeling part-by-part geometry composition through next-part prediction.

Contribution

This work pioneers the integration of next-token prediction paradigm into 3D object generation, enabling detailed and high-quality 3D asset synthesis.

Findings

01

Outperforms existing methods in text-to-3D and image-to-3D tasks

02

Achieves superior generation quality on ShapeNet and Objaverse datasets

03

Effectively models 3D geometry part by part

Abstract

We present TAR3D, a novel framework that consists of a 3D-aware Vector Quantized-Variational AutoEncoder (VQ-VAE) and a Generative Pre-trained Transformer (GPT) to generate high-quality 3D assets. The core insight of this work is to migrate the multimodal unification and promising learning capabilities of the next-token prediction paradigm to conditional 3D object generation. To achieve this, the 3D VQ-VAE first encodes a wide range of 3D shapes into a compact triplane latent space and utilizes a set of discrete representations from a trainable codebook to reconstruct fine-grained geometries under the supervision of query point occupancy. Then, the 3D GPT, equipped with a custom triplane position embedding called TriPE, predicts the codebook index sequence with prefilling prompt tokens in an autoregressive manner so that the composition of 3D geometries can be modeled part by part.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis

MethodsAttention Is All You Need · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Absolute Position Encodings