Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Come Fiegel; Pierre Menard; Tadashi Kozuno; Michal Valko; Vianney Perchet

arXiv:2604.15242·cs.LG·April 17, 2026

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Come Fiegel, Pierre Menard, Tadashi Kozuno, Michal Valko, Vianney Perchet

PDF

TL;DR

This paper demonstrates that using a log-barrier regularization in online mirror descent algorithms achieves optimal last-iterate convergence rates in zero-sum matrix and extensive-form games with high probability.

Contribution

It introduces a novel approach combining log-barrier regularization and dual-focused analysis to attain the theoretical convergence rate Omega(t^{-1/4}) in these game settings.

Findings

01

Achieves O-tilde(t^{-1/4}) convergence rate with high probability.

02

Extends the approach to extensive-form games with similar convergence guarantees.

03

Provides a new method for last-iterate convergence in uncoupled game algorithms.

Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.