PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images
Yiheng Xiong, Angela Dai

TL;DR
This paper introduces PT43D, a probabilistic transformer model that generates 3D shapes from single RGB images, effectively handling occlusions and truncations by using simulated training data and cross-attention mechanisms, outperforming existing methods.
Contribution
The paper presents a novel transformer-based autoregressive model for 3D shape generation from ambiguous images, with a focus on realistic occlusion and truncation scenarios, and demonstrates superior performance over state-of-the-art approaches.
Findings
Outperforms existing methods in both synthetic and real-world data.
Effectively handles occlusion and truncation in input images.
Uses cross-attention to identify relevant image regions for shape generation.
Abstract
Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabilistic distribution of 3D shapes conditioned on an RGB image containing potentially highly ambiguous observations of the object. To handle realistic scenarios such as occlusion or field-of-view truncation, we create simulated image-to-shape training pairs that enable improved fine-tuning for real-world scenarios. We then adopt cross-attention to effectively identify the most relevant region of interest from the input image for shape generation. This enables inference of sampled shapes with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Medical Image Segmentation Techniques · Advanced Vision and Imaging
