VMLoc: Variational Fusion For Learning-Based Multimodal Camera   Localization

Kaichen Zhou; Changhao Chen; Bing Wang; Muhamad Risqi U. Saputra; Niki; Trigoni; Andrew Markham

arXiv:2003.07289·cs.CV·January 22, 2025·1 cites

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization

Kaichen Zhou, Changhao Chen, Bing Wang, Muhamad Risqi U. Saputra, Niki, Trigoni, Andrew Markham

PDF

Open Access 1 Repo 1 Video

TL;DR

VMLoc introduces a novel variational fusion framework for multimodal camera localization, effectively combining image and depth data through a Product-of-Experts and attention mechanisms, outperforming previous methods especially with degraded inputs.

Contribution

The paper presents VMLoc, an end-to-end variational fusion approach that improves multimodal camera localization by addressing naive feature fusion and handling degraded inputs.

Findings

01

Outperforms previous multimodal localization methods on RGB-D datasets.

02

Effectively fuses image and depth modalities using a variational Product-of-Experts.

03

Handles degraded or missing input data robustly.

Abstract

Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaichen-z/vmloc
pytorchOfficial

Videos

VMLoc: Variational Fusion for Learning-Based Multimodal Camera Localization· underline

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Domain Adaptation and Few-Shot Learning