A non-zero-sum game with reinforcement learning under mean-variance framework
Junyi Guo, Xia Han, Hao Wang, Kam Chuen Yuen

TL;DR
This paper models a competitive financial market as a non-zero-sum game with reinforcement learning, deriving equilibrium strategies under mean-variance objectives and demonstrating the effectiveness of a novel RL algorithm.
Contribution
It introduces a new RL-based approach to find Nash equilibria in non-zero-sum mean-variance games with incomplete market information.
Findings
Explicit analytical solutions under Gaussian assumptions
A practical RL algorithm with proven convergence
Numerical results showing robustness and effectiveness
Abstract
In this paper, we investigate a competitive market involving two agents who consider both their own wealth and the wealth gap with their opponent. Both agents can invest in a financial market consisting of a risk-free asset and a risky asset, under conditions where model parameters are partially or completely unknown. This setup gives rise to a non-zero-sum differential game within the framework of reinforcement learning (RL). Each agent aims to maximize his own Choquet-regularized, time-inconsistent mean-variance objective. Adopting the dynamic programming approach, we derive a time-consistent Nash equilibrium strategy in a general incomplete market setting. Under the additional assumption of a Gaussian mean return model, we obtain an explicit analytical solution, which facilitates the development of a practical RL algorithm. Notably, the proposed algorithm achieves uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
