MUSE: Multimodal Uncertainty Quantification of State Estimation

Minkyung Kim; Henry Che; Bhargav Chandaka; Bhumsitt Pramuanpornsatid; Chengyu Yang; Sheng Cheng; Xiaofeng Wang; Naira Hovakimyan; Shenlong Wang

arXiv:2605.17421·cs.RO·May 19, 2026

MUSE: Multimodal Uncertainty Quantification of State Estimation

Minkyung Kim, Henry Che, Bhargav Chandaka, Bhumsitt Pramuanpornsatid, Chengyu Yang, Sheng Cheng, Xiaofeng Wang, Naira Hovakimyan, Shenlong Wang

PDF

TL;DR

MUSE is a real-time learning-based framework that quantifies uncertainty in visual state estimation by leveraging multimodal sensor data, improving reliability and robustness in robotics applications.

Contribution

Introduces MUSE, a novel framework for multimodal uncertainty quantification in state estimation, utilizing Mamba for sequential modeling in asynchronous sensor streams.

Findings

01

MUSE outperforms existing methods in reliability and robustness.

02

Experiments on public and in-house datasets validate MUSE's effectiveness.

03

Ablation studies confirm the importance of key design choices.

Abstract

Accurate visual state estimation has been a central topic in robotics with a wide range of applications in robot navigation, autonomous driving, and autonomous flight. Recent advances in robot perception have led to significant improvements in the accuracy and robustness of state estimation, yet a fundamental challenge remains in how to quantify and calibrate its precision, i.e., how confident we are in an estimate and whether failures can be detected. This issue is particularly pronounced in visual-inertial odometry (VIO), where the heteroscedastic and multimodal nature of the problem makes uncertainty quantification especially difficult. This paper introduces MUSE (Multimodal Uncertainty Quantification of State Estimation), a novel real-time learning-based framework that leverages the strong and efficient sequential modeling capacity of Mamba to estimate localization uncertainty from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.