TL;DR
This paper introduces a novel learned local descriptor for 2D-3D matching using a dual auto-encoder, demonstrating improved discriminability and robustness across various cross-domain tasks and generalization to single-domain tasks.
Contribution
The paper proposes a dual auto-encoder neural network to learn a shared latent space for 2D and 3D data, along with a new large dataset for training and evaluation.
Findings
Descriptors are more discriminative in shared latent space.
Method outperforms existing approaches in cross-domain matching.
Model generalizes well to 2D-only and 3D-only tasks.
Abstract
In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
