VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single   and Multi-view 3D Reconstruction

Jisan Mahmud; Jan-Michael Frahm

arXiv:2203.07553·cs.CV·July 19, 2022

VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single and Multi-view 3D Reconstruction

Jisan Mahmud, Jan-Michael Frahm

PDF

Open Access

TL;DR

VPFusion is a unified neural implicit 3D reconstruction framework that combines 3D feature volumes with pixel-aligned image features, utilizing transformer-based multi-view fusion for improved accuracy in single and multi-view settings.

Contribution

It introduces a novel transformer-based pairwise view association architecture for multi-view feature fusion in 3D reconstruction.

Findings

01

Outperforms existing methods on ShapeNet and ModelNet datasets.

02

Achieves higher reconstruction quality with combined 3D and pixel-aligned features.

03

Demonstrates the effectiveness of transformer-based multi-view fusion.

Abstract

We introduce a unified single and multi-view neural implicit 3D reconstruction framework VPFusion. VPFusion attains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context, and pixel-aligned image features to capture fine local detail. Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion. RNNs suffer from long-term memory loss and permutation variance, while feature pooling or independently computed attention leads to representation in each view being unaware of other views before the final pooling step. In contrast, we show improved multi-view feature fusion by establishing transformer-based pairwise view association. In particular, we propose a novel interleaved 3D reasoning and pairwise view association architecture for feature volume fusion across different views. Using this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis