Efficient Risk-Averse Reinforcement Learning
Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

TL;DR
This paper introduces CeSoR, a novel risk-averse reinforcement learning method that combines a soft risk mechanism with a cross-entropy sampling module, improving sample efficiency and overcoming local optima in risk-sensitive tasks.
Contribution
It proposes a new CeSoR algorithm that separates risk sampling from policy optimization, enhancing risk aversion and efficiency in various RL benchmarks.
Findings
CeSoR outperforms standard risk-averse methods in maze navigation and autonomous driving.
The soft risk mechanism helps bypass local optima barriers.
Cross entropy sampling improves sample efficiency and risk handling.
Abstract
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We also devise a novel Cross Entropy module for risk sampling, which (1) preserves risk aversion despite the soft risk; (2) independently improves sample efficiency. By separating the risk aversion of the sampler and the optimizer, we can sample episodes with poor conditions, yet optimize with respect to successful strategies. We combine these two concepts in CeSoR - Cross-entropy Soft-Risk optimization algorithm - which can be applied on top of any risk-averse policy gradient (PG)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
