3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
Leslie Ching Ow Tiong, Dick Sigmund, Andrew Beng Jin Teoh

TL;DR
The paper introduces 3D-C2FT, a transformer model with a coarse-to-fine attention mechanism that improves multi-view 3D reconstruction by better encoding features and correcting 3D surfaces, outperforming existing models.
Contribution
It proposes a novel coarse-to-fine attention mechanism within a transformer for enhanced multi-view 3D reconstruction, addressing previous attention design challenges.
Findings
Achieves superior performance on ShapeNet and real-life datasets.
Effectively encodes multi-view features and rectifies 3D surface defects.
Outperforms several competing models in accuracy and robustness.
Abstract
Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain on designing an attention mechanism to explore the multiview features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features and rectifying defective 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Advanced Vision and Imaging
