Confidence-Gated Robot Autonomy: When Does Uncertainty Actually Help?
Johannes A. Gaus, Jhon P.F. Charaja, Daniel Haeufle

TL;DR
This paper evaluates how uncertainty metrics influence autonomous decision-making in robotics, revealing their limitations and conditions under which they are effective or not.
Contribution
It introduces evaluation methods for uncertainty in robotic gating and compares different uncertainty measures across benchmarks and simulated scenarios.
Findings
Uncertainty provides weak error ranking below a dataset-dependent competence threshold.
Above this threshold, different heuristics produce similar gating behavior.
Uncertainty ranking remains stable under covariate shift but struggles with semantic OOD detection.
Abstract
Robotic systems often use predictive uncertainty to decide whether to act autonomously or defer to a fallback policy. In threshold-gated autonomy, uncertainty matters mainly through its ability to rank likely errors. Standard metrics such as expected calibration error and AUROC do not directly test whether uncertainty changes act/defer decisions. We therefore evaluate uncertainty using Spearman rank correlation, paired bootstrap equivalence testing, and act/defer agreement. Across three temporal activity-recognition benchmarks, we find a dataset-dependent competence regime below which uncertainty provides a weak and unstable error ranking. Above this regime, softmax heuristics, MC Dropout, and ensembles produce similar gating behavior, while threshold choice has a much larger effect on execution outcomes. A multi-seed embodied simulation shows the same pattern for collision rate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
