Evaluating Latent Generative Paradigms for High-Fidelity 3D Shape Completion from a Single Depth Image
Matthias Humt, Ulrich Hillenbrand, Rudolph Triebel

TL;DR
This paper compares diffusion models and autoregressive transformers for 3D shape completion from a single depth image, demonstrating diffusion's superior performance and autoregressive's competitive results on the same latent space.
Contribution
It provides a comprehensive evaluation of two leading generative paradigms for 3D shape completion, highlighting their strengths and limitations.
Findings
Diffusion models outperform autoregressive models in multi-modal shape completion.
Autoregressive models can match diffusion performance on the same discrete latent space.
Diffusion models achieve state-of-the-art results on noisy depth image completion.
Abstract
While generative models have seen significant adoption across a wide range of data modalities, including 3D data, a consensus on which model is best suited for which task has yet to be reached. Further, conditional information such as text and images to steer the generation process are frequently employed, whereas others, like partial 3D data, have not been thoroughly evaluated. In this work, we compare two of the most promising generative models--Denoising Diffusion Probabilistic Models and Autoregressive Causal Transformers--which we adapt for the tasks of generative shape modeling and completion. We conduct a thorough quantitative evaluation and comparison of both tasks, including a baseline discriminative model and an extensive ablation study. Our results show that (1) the diffusion model with continuous latents outperforms both the discriminative model and the autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
