TL;DR
FMF-SLAM is an efficient multimodal fusion SLAM method that uses Fourier-based attention mechanisms and multi-scale knowledge distillation to improve robustness and real-time performance in challenging environments.
Contribution
The paper introduces a novel Fourier transform-based self-attention and cross-attention mechanism for multimodal feature extraction in SLAM, enhancing efficiency and robustness.
Findings
Achieves state-of-the-art performance in noisy and dark environments.
Operates in real-time when integrated with a security robot.
Validated on multiple datasets including TUM and TartanAir.
Abstract
Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency. Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals. We further enhance the interaction of multimodal features by incorporating multi-scale knowledge distillation across modalities. We also demonstrate the practical feasibility of FMF-SLAM in real-world scenarios with real time performance by integrating it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
