Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret

Shinji Ito; Haipeng Luo; Arnab Maiti; Taira Tsuchiya; Yue Wu

arXiv:2602.06348·cs.LG·February 9, 2026

Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret

Shinji Ito, Haipeng Luo, Arnab Maiti, Taira Tsuchiya, Yue Wu

PDF

Open Access

TL;DR

This paper develops algorithms for zero-sum games with bandit feedback, achieving logarithmic regret bounds against the maximin pure strategy, and extends these results to bilinear games with large action sets.

Contribution

It introduces the Tsallis-INF and Maximin-UCB algorithms for uninformed and informed bandit feedback settings, providing the first logarithmic regret bounds in these challenging scenarios.

Findings

01

Tsallis-INF achieves $O(c \, \log T)$ regret in uninformed bandit settings.

02

Maximin-UCB attains $O(c' \, \log T)$ regret in informed bandit settings.

03

Extensions to bilinear games with large action spaces maintain similar regret bounds.

Abstract

Learning to play zero-sum games is a fundamental problem in game theory and machine learning. While significant progress has been made in minimizing external regret in the self-play settings or with full-information feedback, real-world applications often force learners to play against unknown, arbitrary opponents and restrict learners to bandit feedback where only the payoff of the realized action is observable. In such challenging settings, it is well-known that $Ω (T)$ external regret is unavoidable (where T is the number of rounds). To overcome this barrier, we investigate adversarial learning in zero-sum games under bandit feedback, aiming to minimize the deficit against the maximin pure strategy -- a metric we term Pure-Strategy Maximin Regret. We analyze this problem under two bandit feedback models: uninformed (only the realized reward is revealed) and informed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques