Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret
Shinji Ito, Haipeng Luo, Arnab Maiti, Taira Tsuchiya, Yue Wu

TL;DR
This paper develops algorithms for zero-sum games with bandit feedback, achieving logarithmic regret bounds against the maximin pure strategy, and extends these results to bilinear games with large action sets.
Contribution
It introduces the Tsallis-INF and Maximin-UCB algorithms for uninformed and informed bandit feedback settings, providing the first logarithmic regret bounds in these challenging scenarios.
Findings
Tsallis-INF achieves $O(c \, \log T)$ regret in uninformed bandit settings.
Maximin-UCB attains $O(c' \, \log T)$ regret in informed bandit settings.
Extensions to bilinear games with large action spaces maintain similar regret bounds.
Abstract
Learning to play zero-sum games is a fundamental problem in game theory and machine learning. While significant progress has been made in minimizing external regret in the self-play settings or with full-information feedback, real-world applications often force learners to play against unknown, arbitrary opponents and restrict learners to bandit feedback where only the payoff of the realized action is observable. In such challenging settings, it is well-known that external regret is unavoidable (where T is the number of rounds). To overcome this barrier, we investigate adversarial learning in zero-sum games under bandit feedback, aiming to minimize the deficit against the maximin pure strategy -- a metric we term Pure-Strategy Maximin Regret. We analyze this problem under two bandit feedback models: uninformed (only the realized reward is revealed) and informed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques
