PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single   Highly-Ambiguous RGB Images

Yiheng Xiong; Angela Dai

arXiv:2405.11914·cs.CV·November 5, 2024

PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

Yiheng Xiong, Angela Dai

PDF

Open Access 1 Repo

TL;DR

This paper introduces PT43D, a probabilistic transformer model that generates 3D shapes from single RGB images, effectively handling occlusions and truncations by using simulated training data and cross-attention mechanisms, outperforming existing methods.

Contribution

The paper presents a novel transformer-based autoregressive model for 3D shape generation from ambiguous images, with a focus on realistic occlusion and truncation scenarios, and demonstrates superior performance over state-of-the-art approaches.

Findings

01

Outperforms existing methods in both synthetic and real-world data.

02

Effectively handles occlusion and truncation in input images.

03

Uses cross-attention to identify relevant image regions for shape generation.

Abstract

Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabilistic distribution of 3D shapes conditioned on an RGB image containing potentially highly ambiguous observations of the object. To handle realistic scenarios such as occlusion or field-of-view truncation, we create simulated image-to-shape training pairs that enable improved fine-tuning for real-world scenarios. We then adopt cross-attention to effectively identify the most relevant region of interest from the input image for shape generation. This enables inference of sampled shapes with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiongyiheng/pt43d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Medical Image Segmentation Techniques · Advanced Vision and Imaging