Risk-Averse Trust Region Optimization for Reward-Volatility Reduction
Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello, Restelli

TL;DR
This paper introduces a novel risk measure called reward volatility for reinforcement learning, and develops a risk-averse policy optimization method that reduces reward volatility to control return variance, with applications in financial simulations.
Contribution
It proposes a new reward volatility measure, derives a policy gradient theorem based on it, and adapts TRPO for risk-averse optimization in RL.
Findings
Reward volatility bounds return variance, enabling risk control.
The proposed method effectively reduces reward volatility in financial simulations.
The approach maintains monotonic policy improvement guarantees.
Abstract
In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns. Risk aversion has been addressed in the reinforcement learning literature through risk measures related to the variance of returns. However, in many cases, the risk is measured not only on a long-term perspective, but also on the step-wise rewards (e.g., in trading, to ensure the stability of the investment bank, it is essential to monitor the risk of portfolio positions on a daily basis). In this paper, we define a novel measure of risk, which we call reward volatility, consisting of the variance of the rewards under the state-occupancy measure. We show that the reward volatility bounds the return variance so that reducing the former also constrains the latter. We derive a policy gradient theorem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · Trust Region Policy Optimization
