Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function

Yaacov Pariente; Vadim Indelman

arXiv:2601.20554·cs.AI·January 29, 2026

Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function

Yaacov Pariente, Vadim Indelman

PDF

Open Access

TL;DR

This paper develops risk-sensitive planning algorithms for POMDPs using the ICVaR measure, providing finite-time guarantees and demonstrating reduced tail risk in benchmark domains.

Contribution

It introduces a novel ICVaR-based extension of online POMDP planning algorithms with theoretical guarantees and risk-averse exploration strategies.

Findings

01

ICVaR planners achieve lower tail risk in benchmarks

02

Finite-time performance guarantees are established for ICVaR Sparse Sampling

03

Risk parameter $eta$ controls the level of risk aversion

Abstract

We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $α$ , where $α = 1$ recovers standard expectation-based planning and $α < 1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · AI-based Problem Solving and Planning