Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions
Soumita Hait,Ping Li,Haipeng Luo,Mengxiao Zhang

TL;DR
This paper demonstrates that in two-player zero-sum games with opponent action feedback, it is possible to achieve near-optimal last-iterate convergence rates using an efficient algorithm, surpassing previous limitations.
Contribution
The authors introduce a novel algorithm that exploits opponent action feedback to attain t^(-1/2) last-iterate convergence in zero-sum games, overcoming analysis obstacles in multi-armed bandit settings.
Findings
Achieves t^(-1/2) last-iterate convergence with high probability.
Develops a new analysis overcoming standard multi-armed bandit obstacles.
Experiments show faster convergence than naive baselines and prior methods.
Abstract
Last-iterate convergence of learning dynamics in games has attracted significant recent attention. In two-player zero-sum games with bandit feedback, where only the loss of the selected action pair is observed, Fiegel et al. (2025) show a separation between average-iterate and last-iterate convergence in duality gap: while the optimal t^(-1/2) rate after t rounds is achievable for the former via standard no-regret algorithms, the latter cannot converge faster than t^(-1/3) in expectation or t^(-1/4) with high probability. However, in many practical settings, such as preference learning, the players observe not only their loss but also the opponent's action. This raises a natural question: can such additional information enable faster last-iterate convergence? We answer this question affirmatively, showing that t^(-1/2) last-iterate convergence is achievable with high probability in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
