Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

Soumita Hait,Ping Li,Haipeng Luo,Mengxiao Zhang

arXiv:2605.09363·cs.LG·May 12, 2026

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

Soumita Hait,Ping Li,Haipeng Luo,Mengxiao Zhang

PDF

TL;DR

This paper demonstrates that in two-player zero-sum games with opponent action feedback, it is possible to achieve near-optimal last-iterate convergence rates using an efficient algorithm, surpassing previous limitations.

Contribution

The authors introduce a novel algorithm that exploits opponent action feedback to attain t^(-1/2) last-iterate convergence in zero-sum games, overcoming analysis obstacles in multi-armed bandit settings.

Findings

01

Achieves t^(-1/2) last-iterate convergence with high probability.

02

Develops a new analysis overcoming standard multi-armed bandit obstacles.

03

Experiments show faster convergence than naive baselines and prior methods.

Abstract

Last-iterate convergence of learning dynamics in games has attracted significant recent attention. In two-player zero-sum games with bandit feedback, where only the loss of the selected action pair is observed, Fiegel et al. (2025) show a separation between average-iterate and last-iterate convergence in duality gap: while the optimal t^(-1/2) rate after t rounds is achievable for the former via standard no-regret algorithms, the latter cannot converge faster than t^(-1/3) in expectation or t^(-1/4) with high probability. However, in many practical settings, such as preference learning, the players observe not only their loss but also the opponent's action. This raises a natural question: can such additional information enable faster last-iterate convergence? We answer this question affirmatively, showing that t^(-1/2) last-iterate convergence is achievable with high probability in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.