The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

C\^ome Fiegel; Pierre M\'enard; Tadashi Kozuno; Michal Valko; Vianney Perchet

arXiv:2604.16087·cs.LG·April 20, 2026

The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

C\^ome Fiegel, Pierre M\'enard, Tadashi Kozuno, Michal Valko, Vianney Perchet

PDF

1 Video

TL;DR

This paper investigates last-iterate convergence in zero-sum games with bandit feedback, establishing fundamental limits and proposing algorithms that achieve near-optimal convergence rates without communication.

Contribution

It identifies the optimal convergence rate for uncoupled algorithms in bandit settings and introduces two algorithms that attain this rate up to logarithmic factors.

Findings

01

The best achievable convergence rate for last-iterate in this setting is (T^{-1/4})

02

Proposed algorithms match this rate up to constant and logarithmic factors

03

Guarantees are provided without communication between players.

Abstract

We study the problem of learning in zero-sum matrix games with repeated play and bandit feedback. Specifically, we focus on developing uncoupled algorithms that guarantee, without communication between players, the convergence of the last-iterate to a Nash equilibrium. Although the non-bandit case has been studied extensively, this setting has only been explored recently, with a bound of $O (T^{- 1/8})$ on the exploitability gap. We show that, for uncoupled algorithms, guaranteeing convergence of the policy profiles to a Nash equilibrium is detrimental to the performance, with the best attainable rate being $Ω (T^{- 1/4})$ in contrast to the usual $Ω (T^{- 1/2})$ rate for convergence of the average iterates. We then propose two algorithms that achieve this optimal rate up to constant and logarithmic factors. The first algorithm leverages a straightforward trade-off between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback· slideslive