UMIFormer: Mining the Correlations between Similar Tokens for Multi-View   3D Reconstruction

Zhenwei Zhu; Liying Yang; Ning Li; Chaohao Jiang; Yanyan Liang

arXiv:2302.13987·cs.CV·August 21, 2023·1 cites

UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction

Zhenwei Zhu, Liying Yang, Ning Li, Chaohao Jiang, Yanyan Liang

PDF

Open Access 1 Repo 1 Video

TL;DR

UMIFormer introduces a transformer-based approach that mines correlations between similar tokens across unstructured views to improve multi-view 3D reconstruction, outperforming state-of-the-art methods on ShapeNet.

Contribution

The paper proposes UMIFormer, a novel transformer network that effectively mines inter-view token correlations for unstructured multi-view 3D reconstruction.

Findings

01

Outperforms existing SOTA methods on ShapeNet

02

Effectively mines correlations between tokens from different views

03

Demonstrates adaptability to unstructured multiple images

Abstract

In recent years, many video tasks have achieved breakthroughs by utilizing the vision transformer and establishing spatial-temporal decoupling for feature extraction. Although multi-view 3D reconstruction also faces multiple images as input, it cannot immediately inherit their success due to completely ambiguous associations between unstructured views. There is not usable prior relationship, which is similar to the temporally-coherence property in a video. To solve this problem, we propose a novel transformer network for Unstructured Multiple Images (UMIFormer). It exploits transformer blocks for decoupled intra-view encoding and designed blocks for token rectification that mine the correlation between similar tokens from different views to achieve decoupled inter-view encoding. Afterward, all tokens acquired from various branches are compressed into a fixed-size compact representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

garyzhu1996/umiformer
pytorchOfficial

Videos

UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction· youtube

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Vision Transformer