Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao

TL;DR
This paper introduces risk-averse reinforcement learning methods optimizing the mean-semivariance criterion, addressing the challenge of semivariance's time-inconsistency with novel algorithms validated through diverse experiments.
Contribution
It develops a new theoretical framework for MSV optimization in RL using Perturbation Analysis and proposes two practical on-policy algorithms based on policy gradients and trust regions.
Findings
Algorithms outperform baseline methods in various tasks.
MSV optimization improves risk management in RL.
Proposed methods are effective in both simple and complex environments.
Abstract
Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, which penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady reward distribution. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
