Hierarchies of No-regret Algorithms
R. Xu, E. Yachbes, J. Zhang

TL;DR
This paper investigates the hierarchy of no-regret algorithms in two-player games, revealing counterintuitive utility outcomes and the impact of learning rates on algorithm performance.
Contribution
It uncovers the surprising result that no-swap-regret algorithms can be worse for players and introduces methods to balance learning rates for fairer comparisons.
Findings
No-swap-regret algorithms can yield higher opponent utility in many games.
Slower learning rates of no-swap-regret algorithms explain their poorer performance.
In certain random games, no-swap-regret algorithms outperform no-regret algorithms.
Abstract
Our paper studies the setting of players using no-regret algorithms in various two-player games. We address whether having stronger regret guarantees or playing against an opponent with weaker regret guarantees yields higher utilities for the player in question. We consider a hierarchy of algorithms from weakest to strongest: uniform random play, no-regret, and no-swap-regret. We find, counterintuitively, that in many games, no-swap-regret is a worse choice for players (and gives better utility for their opponents). We find the root cause of this phenomenon to be a difference in effective learning rate between the two algorithms, where the no-swap-regret algorithms learn times slower than no-regret algorithms. To address this, we attempt to equalize learning rates, leading to closer utility between no-regret and no-swap-regret players. Finally, we show that for certain random games…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
