Efficient 3D Object Reconstruction using Visual Transformers
Rohan Agarwal, Wei Zhou, Xiaofeng Wu, Yuhan Li

TL;DR
This paper explores using visual transformers instead of traditional convolutional methods for 3D object reconstruction from 2D images, demonstrating comparable or improved accuracy and efficiency.
Contribution
It introduces a transformer-based encoder-decoder architecture for 3D reconstruction, showing that transformers can outperform convolutional approaches in this task.
Findings
Transformer-based models achieve similar or better accuracy than convolutional baselines.
Transformers offer improved training efficiency over traditional convolutional methods.
The study provides evidence of transformers' potential in 3D vision tasks.
Abstract
Reconstructing a 3D object from a 2D image is a well-researched vision problem, with many kinds of deep learning techniques having been tried. Most commonly, 3D convolutional approaches are used, though previous work has shown state-of-the-art methods using 2D convolutions that are also significantly more efficient to train. With the recent rise of transformers for vision tasks, often outperforming convolutional methods, along with some earlier attempts to use transformers for 3D object reconstruction, we set out to use visual transformers in place of convolutions in existing efficient, high-performing techniques for 3D object reconstruction in order to achieve superior results on the task. Using a transformer-based encoder and decoder to predict 3D structure from 2D images, we achieve accuracy similar or superior to the baseline approach. This study serves as evidence for the potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
