TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou, Ming-Ming Cheng

TL;DR
TAR3D introduces a novel framework combining a 3D-aware VQ-VAE and GPT to generate high-quality 3D assets by modeling part-by-part geometry composition through next-part prediction.
Contribution
This work pioneers the integration of next-token prediction paradigm into 3D object generation, enabling detailed and high-quality 3D asset synthesis.
Findings
Outperforms existing methods in text-to-3D and image-to-3D tasks
Achieves superior generation quality on ShapeNet and Objaverse datasets
Effectively models 3D geometry part by part
Abstract
We present TAR3D, a novel framework that consists of a 3D-aware Vector Quantized-Variational AutoEncoder (VQ-VAE) and a Generative Pre-trained Transformer (GPT) to generate high-quality 3D assets. The core insight of this work is to migrate the multimodal unification and promising learning capabilities of the next-token prediction paradigm to conditional 3D object generation. To achieve this, the 3D VQ-VAE first encodes a wide range of 3D shapes into a compact triplane latent space and utilizes a set of discrete representations from a trainable codebook to reconstruct fine-grained geometries under the supervision of query point occupancy. Then, the 3D GPT, equipped with a custom triplane position embedding called TriPE, predicts the codebook index sequence with prefilling prompt tokens in an autoregressive manner so that the composition of 3D geometries can be modeled part by part.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Absolute Position Encodings
