Accelerated Online Risk-Averse Policy Evaluation in POMDPs with Theoretical Guarantees and Novel CVaR Bounds
Yaacov Pariente, Vadim Indelman

TL;DR
This paper presents a theoretical framework and practical algorithms for faster risk-averse policy evaluation in POMDPs using CVaR, with guarantees and bounds that enable safe action elimination and computational acceleration.
Contribution
It introduces new CVaR bounds based on auxiliary variables, develops estimators within a particle belief framework, and proposes an action elimination method for efficient risk-averse POMDP evaluation.
Findings
Bounds effectively distinguish safe and risky policies
Significant computational speedups achieved
Method maintains theoretical guarantees
Abstract
Risk-averse decision-making under uncertainty in partially observable domains is a central challenge in artificial intelligence and is essential for developing reliable autonomous agents. The formal framework for such problems is the partially observable Markov decision process (POMDP), where risk sensitivity is introduced through a risk measure applied to the value function, with Conditional Value-at-Risk (CVaR) being a particularly significant criterion. However, solving POMDPs is computationally intractable in general, and approximate methods rely on computationally expensive simulations of future agent trajectories. This work introduces a theoretical framework for accelerating CVaR value function evaluation in POMDPs with formal performance guarantees. We derive new bounds on the CVaR of a random variable X using an auxiliary random variable Y, under assumptions relating their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference
