POC-SLT: Partial Object Completion with SDF Latent Transformers

Faezeh Zakeri; Raphael Braun; Lukas Ruppert; Henrik P.A. Lensch

arXiv:2411.05419·cs.CV·November 11, 2024

POC-SLT: Partial Object Completion with SDF Latent Transformers

Faezeh Zakeri, Raphael Braun, Lukas Ruppert, Henrik P.A. Lensch

PDF

Open Access 3 Reviews

TL;DR

This paper introduces POC-SLT, a transformer-based method operating on latent SDF patches for 3D shape completion from partial data, showing significant improvements over existing methods.

Contribution

It proposes a novel transformer approach on latent SDF patches for 3D shape completion, leveraging a VAE for smooth latent encoding and outperforming state-of-the-art techniques.

Findings

01

Outperforms baseline methods in shape completion quality

02

Effective on partial observations from ShapeNet and ABC datasets

03

Significant quantitative and qualitative improvements

Abstract

3D geometric shape completion hinges on representation learning and a deep understanding of geometric data. Without profound insights into the three-dimensional nature of the data, this task remains unattainable. Our work addresses this challenge of 3D shape completion given partial observations by proposing a transformer operating on the latent space representing Signed Distance Fields (SDFs). Instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes. The approach relies on a smooth latent space encoding learned via a variational autoencoder (VAE), trained on millions of 3D patches. We employ an efficient masked autoencoder transformer to complete partial sequences into comprehensive shapes in latent space. Our approach is extensively evaluated on partial observations from ShapeNet and the ABC dataset…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 5

Strengths

1. The paper is easy to understand. 2. The experimental results are better than the compared methods.

Weaknesses

The novelty of this paper is very limited. 1. The division of high-resolution voxels into smaller patches is reasonable but is straightforward and thus of limited innovation. 2. The p-VAE is almost identical to the original vae without any adaptation or improvements for this task. 3. The SDF-Latent-Transformer idea is very similar to the masked autoencoder [1], so it lacks novelty. In short, the method proposed in this paper is somewhat like a combination of different well-known models, ther

Reviewer 02Rating 5Confidence 5

Strengths

1. The authors proposed an efficient architecture that runs much faster than previous diffusion-based methods and auto-regressive-based methods and still demonstrated decent quantitative and qualitative results on shape completion benchmarks. The efficient design and fast running speed are appreciated for possible real-world applications. 2. This paper addressed an interesting problem of 3D shape completion, which could lead to possible applications in 3D reconstruction and robotics. 3. The au

Weaknesses

1. Lack of ablative study on the resolutions of the patch size and number of patches. The 32^3 SDF volume seems to be a relatively large SDF with lots of information. It will be good to have an ablative study with smaller patch sizes, such as 16^3 or 8^3. 2. The presentation of the proposed method is vague and confusing. It will be much easier for the reader to understand if the authors point out they only use a transformer with 8x8x8 context length, and each token encodes the information of a 3

Reviewer 03Rating 3Confidence 3

Strengths

* Fast inference time: a key advantage over other sequential (eg autoregressive) approaches is utilizing the MAE decoder. * Modular Patch-VAE architecture enables generalization by pre-training on small-scale patches, an effective component, even if previously used in related works. * High-quality shape completion, especially in capturing fine-grained details. * Simple yet effective approach, that avoids unnecessary complexity with a straightforward architecture and objective.

Weaknesses

- Potential "leakage" issue: Non-masked voxels adjacent to masked patches may encode distances to missing parts, indirectly leaking information about regions to be completed. Discussing this limitation and potential remedies (e.g., using TSDF instead of SDF) would strengthen the work. Beyond conducting an ablation study of usage of SDF vs TSDF, first it should be qualitatively checked how much information is in fact encapsulated in non-masked patches, which regards the masked patches. Masking in

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification

MethodsApproximate Bayesian Computation