VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization
Kaichen Zhou, Changhao Chen, Bing Wang, Muhamad Risqi U. Saputra, Niki, Trigoni, Andrew Markham

TL;DR
VMLoc introduces a novel variational fusion framework for multimodal camera localization, effectively combining image and depth data through a Product-of-Experts and attention mechanisms, outperforming previous methods especially with degraded inputs.
Contribution
The paper presents VMLoc, an end-to-end variational fusion approach that improves multimodal camera localization by addressing naive feature fusion and handling degraded inputs.
Findings
Outperforms previous multimodal localization methods on RGB-D datasets.
Effectively fuses image and depth modalities using a variational Product-of-Experts.
Handles degraded or missing input data robustly.
Abstract
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Domain Adaptation and Few-Shot Learning
