NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF
Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee

TL;DR
NU-MCC introduces a neighborhood decoder and a repulsive UDF to enhance 3D reconstruction from single-view RGB-D images, achieving faster inference and higher-fidelity results than previous MCC methods.
Contribution
The paper proposes NU-MCC, a novel 3D reconstruction method that improves efficiency and detail recovery by using a neighborhood decoder and a repulsive UDF.
Findings
Outperforms MCC by 9.7% F1-score on CO3D-v2 dataset.
Achieves more than 5x faster inference speed.
Produces more complete and detailed 3D reconstructions.
Abstract
Remarkable progress has been made in 3D reconstruction from single-view RGB-D inputs. MCC is the current state-of-the-art method in this field, which achieves unprecedented success by combining vision Transformers with large-scale training. However, we identified two key limitations of MCC: 1) The Transformer decoder is inefficient in handling large number of query points; 2) The 3D representation struggles to recover high-fidelity details. In this paper, we propose a new approach called NU-MCC that addresses these limitations. NU-MCC includes two key innovations: a Neighborhood decoder and a Repulsive Unsigned Distance Function (Repulsive UDF). First, our Neighborhood decoder introduces center points as an efficient proxy of input visual features, allowing each query point to only attend to a small neighborhood. This design not only results in much faster inference speed but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Computer Graphics and Visualization Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Linear Layer
