3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes
Alara Dirik, Pinar Yanardag

TL;DR
This paper introduces 3D-LatentMapper, a novel framework that uses Vision Transformer and CLIP to enable fast, view-agnostic single-view 3D shape reconstruction, even with occlusions.
Contribution
It presents a new mapping network architecture that connects ViT and CLIP features to a 3D generative model, enabling view-agnostic reconstruction.
Findings
Effective reconstruction of 3D shapes from single views.
Robust performance with occlusions.
Outperforms state-of-the-art methods on ShapeNetV2.
Abstract
Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Linear Layer · Contrastive Language-Image Pre-training · Balanced Selection · Dense Connections
