Learning to Steer Learners in Games
Yizhou Zhang, Yi-An Ma, Eric Mazumdar

TL;DR
This paper investigates how an optimizer can exploit no-regret learners in repeated games, showing it’s impossible without additional information and proposing methods to recover the learner's payoff structure for successful exploitation.
Contribution
The paper demonstrates the limitations of exploiting no-regret learners without additional info and proposes a payoff recovery approach for specific classes of algorithms.
Findings
Exploitation of no-regret learners is impossible with minimal info.
Payoff recovery enables successful exploitation for certain algorithm classes.
Effectiveness shown in ascent and stochastic mirror ascent algorithms.
Abstract
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner's objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner's payoff structure. We demonstrate the effectiveness of this approach if the learner's algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
