A Natural Actor-Critic Algorithm with Downside Risk Constraints
Thomas Spooner, Rahul Savani

TL;DR
This paper introduces a new risk-sensitive reinforcement learning algorithm that efficiently estimates downside risk using a novel Bellman equation, improving sample efficiency and stability in constrained policy optimization.
Contribution
It proposes a new Bellman equation for downside risk, proving its contraction property, and extends an actor-critic method with natural policy gradients for risk-sensitive control.
Findings
Effective on three benchmark problems
Improved sample efficiency and stability
Demonstrates practical utility of the new risk proxy
Abstract
Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
