Mean-Variance Efficient Reinforcement Learning with Applications to Dynamic Financial Investment
Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura and, Kentaro Baba

TL;DR
This paper introduces a computationally efficient reinforcement learning method that optimizes mean-variance trade-offs, enabling the derivation of Pareto efficient policies for dynamic financial investment applications.
Contribution
It proposes a novel utility maximization approach that avoids variance gradient estimation, improving computational efficiency over previous constrained optimization methods.
Findings
The method successfully identifies MV-efficient policies.
It outperforms existing approaches in computational efficiency.
Experimental results validate the approach's effectiveness.
Abstract
This study investigates the mean-variance (MV) trade-off in reinforcement learning (RL), an instance of the sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies whose means and variances are located on the Pareto efficient frontier with respect to the MV trade-off; under the condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employed constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Traffic control and management
