Multi-view 3D Reconstruction with Transformer
Dan Wang, Xinrui Cui, Xun Chen, Zhengxia Zou, Tianyang Shi, Septimiu, Salcudean, Z. Jane Wang, Rabab Ward

TL;DR
This paper introduces 3D Volume Transformer (VolT), a novel Transformer-based framework that unifies feature extraction and view fusion for multi-view 3D object reconstruction, achieving state-of-the-art accuracy with fewer parameters.
Contribution
It reformulates multi-view 3D reconstruction as a sequence prediction task using Transformers, exploring view relationships more effectively than CNN-based methods.
Findings
Achieves new state-of-the-art accuracy on ShapeNet dataset.
Uses 70% fewer parameters than CNN-based methods.
Demonstrates strong scalability of the proposed method.
Abstract
Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - multi-view feature extraction and fusion, are usually investigated separately, and the object relations in different views are rarely explored. In this paper, inspired by the recent great success in self-attention-based Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a new framework named 3D Volume Transformer (VolT) for such a task. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Adam · Dense Connections · Softmax · Layer Normalization · Dropout
