Partially Observable Risk-Sensitive Markov Decision Processes
Nicole B\"auerle, Ulrich Rieder

TL;DR
This paper addresses risk-sensitive decision-making in partially observable Markov decision processes by embedding the problem into an observable MDP with an extended state space, providing conditions for optimal policies and simplifying the exponential utility case.
Contribution
It introduces a novel approach to solve risk-sensitive POMDPs without change of measure techniques by extending the state space with joint distributions.
Findings
Optimal policies exist under certain conditions.
The exponential utility case simplifies the problem significantly.
Numerical example illustrates the impact of certainty equivalent parameters.
Abstract
We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon which is generated by a Partially Observable Markov Decision Process (POMDP). The certainty equivalent is defined by where is an increasing function. In contrast to a risk-neutral decision maker this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by embedding the problem into a completely observable Markov Decision Process with extended state space and give conditions under which an optimal policy exists. The state space has to be extended by the joint conditional distribution of current unobserved state and accumulated cost. In case of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
