Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
P. Geibel, F. Wysotzki

TL;DR
This paper introduces a risk-sensitive reinforcement learning algorithm for Markov Decision Processes with error states, enabling control under safety constraints without relying on strict model assumptions.
Contribution
It formalizes risk as a second criterion in constrained MDPs and presents a model-free heuristic algorithm that balances performance and safety.
Findings
Successfully applied to control of a feed tank system
Handles relaxed assumptions compared to traditional methods
Achieves good performance while respecting risk constraints
Abstract
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
