Mean-Variance Efficient Reinforcement Learning with Applications to   Dynamic Financial Investment

Masahiro Kato; Kei Nakagawa; Kenshi Abe; Tetsuro Morimura and; Kentaro Baba

arXiv:2010.01404·cs.LG·November 14, 2024·1 cites

Mean-Variance Efficient Reinforcement Learning with Applications to Dynamic Financial Investment

Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura and, Kentaro Baba

PDF

Open Access

TL;DR

This paper introduces a computationally efficient reinforcement learning method that optimizes mean-variance trade-offs, enabling the derivation of Pareto efficient policies for dynamic financial investment applications.

Contribution

It proposes a novel utility maximization approach that avoids variance gradient estimation, improving computational efficiency over previous constrained optimization methods.

Findings

01

The method successfully identifies MV-efficient policies.

02

It outperforms existing approaches in computational efficiency.

03

Experimental results validate the approach's effectiveness.

Abstract

This study investigates the mean-variance (MV) trade-off in reinforcement learning (RL), an instance of the sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies whose means and variances are located on the Pareto efficient frontier with respect to the MV trade-off; under the condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employed constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Traffic control and management