Risk-Averse $\omega$-regular Markov Decision Process Control
Ruediger Ehlers, Salar Moarref, Ufuk Topcu

TL;DR
This paper introduces a new risk-averse optimization criterion for Markov decision processes to handle infinite-time horizon specifications where failure cannot be entirely avoided, with algorithms and validation in robot control scenarios.
Contribution
It proposes a novel risk-averse policy computation method for MDPs under infinite-time horizon specifications where failures are inevitable, extending existing approaches.
Findings
Algorithms successfully compute risk-averse policies.
Policies balance optimism with risk aversion.
Validated in two robot control scenarios.
Abstract
Many control problems in environments that can be modeled as Markov decision processes (MDPs) concern infinite-time horizon specifications. The classical aim in this context is to compute a control policy that maximizes the probability of satisfying the specification. In many scenarios, there is however a non-zero probability of failure in every step of the system's execution. For infinite-time horizon specifications, this implies that the specification is violated with probability 1 in the long run no matter what policy is chosen, which prevents previous policy computation methods from being useful in these scenarios. In this paper, we introduce a new optimization criterion for MDP policies that captures the task of working towards the satisfaction of some infinite-time horizon -regular specification. The new criterion is applicable to MDPs in which the violation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Real-Time Systems Scheduling · Petri Nets in System Modeling
