An Online Learning Approach for Two-Player Zero-Sum Linear Quadratic Games
Shanting Wang, Weihao Sun, Andreas A. Malikopoulos

TL;DR
This paper introduces an online learning framework for two-player zero-sum linear quadratic games with unknown dynamics, combining model estimation, confidence sets, and surrogate models to ensure convergence and stability.
Contribution
It proposes a novel approach integrating regularized least squares, confidence sets, and surrogate model selection for policy updates in unknown dynamic environments.
Findings
The algorithm converges with provable regret bounds.
Numerical experiments confirm the theoretical analysis.
The method effectively stabilizes the saddle point solutions.
Abstract
In this paper, we present an online learning approach for two-player zero-sum linear quadratic games with unknown dynamics. We develop a framework combining regularized least squares model estimation, high probability confidence sets, and surrogate model selection to maintain a regular model for policy updates. We apply a shrinkage step at each episode to identify a surrogate model in the region where the generalized algebraic Riccati equation admits a stabilizing saddle point solution. We then establish regret analysis on algorithm convergence, followed by a numerical example to illustrate the convergence performance and verify the regret analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
