Best of Both Worlds: Regret Minimization versus Minimax Play
Adrian M\"uller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher

TL;DR
This paper introduces online learning algorithms that achieve low regret against a specific comparator while maintaining near-optimal regret against any fixed strategy, effectively combining regret minimization and minimax strategies in game settings.
Contribution
It provides the first algorithms that simultaneously guarantee low regret against a comparator and against fixed strategies in bandit feedback scenarios, especially when the comparator supports all actions.
Findings
Algorithms achieve $O(1)$ regret against a comparator strategy.
Algorithms guarantee $ ilde{O}( oot{T} ull)$ regret against fixed strategies.
In zero-sum games, methods ensure minimal loss while exploiting opponents.
Abstract
In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee regret compared to a given comparator strategy, and regret compared to any fixed strategy, where is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most loss while being able to gain from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
