Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Yiyuan Wang

arXiv:2602.08075·math.OC·February 10, 2026

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Yiyuan Wang

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework for zero-sum linear-quadratic stochastic differential games, enabling solutions without complete system knowledge, and proves convergence with numerical validation.

Contribution

It is the first to develop RL algorithms tailored for these games, combining iterative schemes with dynamic programming, and providing convergence analysis.

Findings

01

Algorithms converge under certain rank conditions.

02

Numerical simulations confirm effectiveness.

03

Framework handles unknown system parameters.

Abstract

In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate system parameters are difficult to obtain, thereby overcoming a key limitation of traditional iterative methods that rely on complete system information. In correspondence with the game-theoretic algebraic Riccati equations associated with the problem, we develop both semi-model-based and model-free reinforcement learning algorithms by combining an iterative solution scheme with dynamic programming principles. Notably, under appropriate rank conditions on data sampling, the convergence of the proposed algorithms is rigorously established through theoretical analysis. Finally, numerical simulations are conducted to verify the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Risk and Portfolio Optimization