LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
Xin Kang, Zihan Zheng, Lei Chu, Yue Gao, Jiahao Li, Hao Pan, Xuejin Chen, Yan Lu

TL;DR
LTM3D introduces a novel framework combining diffusion and auto-regressive models in token space for flexible, high-fidelity conditional 3D shape generation across multiple representations.
Contribution
It proposes a new Latent Token space Modeling framework with Prefix Learning and Reconstruction-Guided Sampling, enhancing dependency learning and structural fidelity in 3D shape generation.
Findings
Outperforms existing methods in prompt fidelity and structural accuracy
Supports multiple 3D representations including signed distance fields and point clouds
Demonstrates effectiveness in image- and text-conditioned shape generation
Abstract
We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D features a Conditional Distribution Modeling backbone, leveraging a masked autoencoder and a diffusion model to enhance token dependency learning. Additionally, we introduce Prefix Learning, which aligns condition tokens with shape latent tokens during generation, improving flexibility across modalities. We further propose a Latent Token Reconstruction module with Reconstruction-Guided Sampling to reduce uncertainty and enhance structural fidelity in generated shapes. Our approach operates in token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsDiffusion
