Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence
Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He

TL;DR
This paper introduces a simplified nested zeroth-order algorithm for zero-sum linear quadratic games, achieving improved sample complexity and guaranteed last-iterate convergence in both deterministic and model-free settings.
Contribution
It presents the first global last-iterate linear convergence result for zero-sum LQ games and enhances sample efficiency with a novel nested ZO algorithm.
Findings
Achieves (1/)^2 sample complexity in the model-free setting.
Establishes the first last-iterate linear convergence for zero-sum LQ games.
Improves sample complexity by several orders of magnitude over previous methods.
Abstract
Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i)~as a dynamic game formulation for risk-sensitive or robust control and (ii)~as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. showed that an~-Nash equilibrium (NE) of finite horizon zero-sum LQ games can be learned via nested model-free Natural Policy Gradient (NPG) algorithms with poly sample complexity. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude and guaranteeing convergence of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
