Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Neel P. Bhatt, Yunhao Yang, Rohan Siva, Daniel Milan, Ufuk Topcu,, Zhangyang Wang

TL;DR
This paper introduces a formal framework for disentangling, quantifying, and mitigating perception and decision uncertainties in multimodal foundation models used for robotic planning, leading to improved robustness and success rates.
Contribution
It presents a novel framework for uncertainty disentanglement, tailored quantification methods, and targeted interventions to enhance robotic planning reliability.
Findings
Uncertainty disentanglement reduces variability by up to 40%.
Task success rates improve by 5% with interventions.
Framework demonstrates effectiveness in real-world and simulated tasks.
Abstract
Multimodal foundation models offer a promising framework for robotic perception and planning by processing sensory inputs to generate actionable plans. However, addressing uncertainty in both perception (sensory interpretation) and decision-making (plan generation) remains a critical challenge for ensuring task reliability. We present a comprehensive framework to disentangle, quantify, and mitigate these two forms of uncertainty. We first introduce a framework for uncertainty disentanglement, isolating perception uncertainty arising from limitations in visual understanding and decision uncertainty relating to the robustness of generated plans. To quantify each type of uncertainty, we propose methods tailored to the unique properties of perception and decision-making: we use conformal prediction to calibrate perception uncertainty and introduce Formal-Methods-Driven Prediction (FMDP)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Speech and dialogue systems · Multi-Agent Systems and Negotiation
