3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

Alara Dirik; Pinar Yanardag

arXiv:2212.02184·cs.CV·December 6, 2022·1 cites

3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

Alara Dirik, Pinar Yanardag

PDF

Open Access

TL;DR

This paper introduces 3D-LatentMapper, a novel framework that uses Vision Transformer and CLIP to enable fast, view-agnostic single-view 3D shape reconstruction, even with occlusions.

Contribution

It presents a new mapping network architecture that connects ViT and CLIP features to a 3D generative model, enabling view-agnostic reconstruction.

Findings

01

Effective reconstruction of 3D shapes from single views.

02

Robust performance with occlusions.

03

Outperforms state-of-the-art methods on ShapeNetV2.

Abstract

Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Linear Layer · Contrastive Language-Image Pre-training · Balanced Selection · Dense Connections