Playing against no-regret players
Maurizio D 'Andrea (ANITI, TSE)

TL;DR
This paper investigates how a human player can optimally interact with multiple no-regret algorithms in repeated games, introducing the correlated Stackelberg equilibrium to guarantee utility in multi-player settings.
Contribution
It extends the concept of Stackelberg equilibrium to multi-player games with no-regret learners, proposing the correlated Stackelberg equilibrium and proving utility guarantees.
Findings
The optimizer can guarantee at least the correlated Stackelberg value per round.
The result holds almost surely for the optimizer's utility.
Counterexamples show previous guarantees do not extend to multiple learners.
Abstract
In increasingly different contexts, it happens that a human player has to interact with artificial players who make decisions following decision-making algorithms. How should the human player play against these algorithms to maximize his utility? Does anything change if he faces one or more artificial players? The main goal of the paper is to answer these two questions. Consider n-player games in normal form repeated over time, where we call the human player optimizer, and the (n -- 1) artificial players, learners. We assume that learners play no-regret algorithms, a class of algorithms widely used in online learning and decision-making. In these games, we consider the concept of Stackelberg equilibrium. In a recent paper, Deng, Schneider, and Sivan have shown that in a 2-player game the optimizer can always guarantee an expected cumulative utility of at least the Stackelberg value per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Optimization and Search Problems
