Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target
Alexander Guyer, Thomas G. Dietterich

TL;DR
This paper introduces a method to predict the probability that an autonomous system's performance will meet user-defined goals, enabling better decision-making through calibrated probability estimates.
Contribution
It develops PCQR, an invertible extension of conformalized quantile regression, for calibrated probability estimation of cumulative rewards in autonomous systems.
Findings
Probabilities are well-calibrated in experiments.
Method provides finite-sample marginal guarantees.
Enables alerting when goal achievement probability drops below threshold.
Abstract
As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal. If that probability falls below some desired level, it should alert the user so that appropriate interventions can be made. This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary, such as the cumulative reward, measured at a fixed horizon . At each time , our method produces a calibrated estimate of the probability that the final cumulative reward will fall within a user-specified target interval Using this estimate, the autonomous system can raise an alarm if the probability drops below a specified threshold. We compute the probability estimates by inverting conformal prediction. Our starting point is the Conformalized Quantile Regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Approaches in Technology and Social Development
