LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework

Xin Kang; Zihan Zheng; Lei Chu; Yue Gao; Jiahao Li; Hao Pan; Xuejin Chen; Yan Lu

arXiv:2505.24245·cs.CV·June 2, 2025

LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework

Xin Kang, Zihan Zheng, Lei Chu, Yue Gao, Jiahao Li, Hao Pan, Xuejin Chen, Yan Lu

PDF

Open Access

TL;DR

LTM3D introduces a novel framework combining diffusion and auto-regressive models in token space for flexible, high-fidelity conditional 3D shape generation across multiple representations.

Contribution

It proposes a new Latent Token space Modeling framework with Prefix Learning and Reconstruction-Guided Sampling, enhancing dependency learning and structural fidelity in 3D shape generation.

Findings

01

Outperforms existing methods in prompt fidelity and structural accuracy

02

Supports multiple 3D representations including signed distance fields and point clouds

03

Demonstrates effectiveness in image- and text-conditioned shape generation

Abstract

We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D features a Conditional Distribution Modeling backbone, leveraging a masked autoencoder and a diffusion model to enhance token dependency learning. Additionally, we introduce Prefix Learning, which aligns condition tokens with shape latent tokens during generation, improving flexibility across modalities. We further propose a Latent Token Reconstruction module with Reconstruction-Guided Sampling to reduce uncertainty and enhance structural fidelity in generated shapes. Our approach operates in token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques

MethodsDiffusion