G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer
Jinzhi Zhang, Feng Xiong, Mu Xu

TL;DR
G3PT introduces a novel cross-scale querying transformer for 3D generation that effectively models unordered 3D data without artificial sequencing, achieving superior quality and revealing power-law scaling behaviors.
Contribution
It proposes a scalable coarse-to-fine 3D generative model using cross-scale querying transformers, addressing the challenge of unordered 3D data in autoregressive modeling.
Findings
G3PT outperforms previous 3D generation methods in quality and generalization.
The model supports diverse conditional 3D shape generation.
Scaling G3PT uncovers power-law scaling behaviors in 3D generation.
Abstract
Autoregressive transformers have revolutionized generative models in language processing and shown substantial promise in image and video generation. However, these models face significant challenges when extended to 3D generation tasks due to their reliance on next-token prediction to learn token sequences, which is incompatible with the unordered nature of 3D data. Instead of imposing an artificial order on 3D data, in this paper, we introduce G3PT, a scalable coarse-to-fine 3D generative model utilizing a cross-scale querying transformer. The key is to map point-based 3D data into discrete tokens with different levels of detail, naturally establishing a sequential relationship between different levels suitable for autoregressive modeling. Additionally, the cross-scale querying transformer connects tokens globally across different levels of detail without requiring an ordered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Robotics and Sensor-Based Localization
