Exploiting No-Regret Algorithms in System Design
Le Cong Dinh, Nick Bishop, Long Tran-Thanh

TL;DR
This paper explores how a system designer can strategically craft payoff matrices in zero-sum games to steer a no-regret learning opponent towards favorable strategies, combining game design with learning algorithms.
Contribution
It introduces a novel method for designing payoff matrices with unique minimax solutions and proposes an algorithm to guide no-regret learners towards these solutions.
Findings
Proposed a payoff matrix design with a unique minimax solution.
Developed an algorithm to steer no-regret players to desired strategies.
Proved convergence of the algorithm to minimax solutions.
Abstract
We investigate a repeated two-player zero-sum game setting where the column player is also a designer of the system, and has full control on the design of the payoff matrix. In addition, the row player uses a no-regret algorithm to efficiently learn how to adapt their strategy to the column player's behaviour over time in order to achieve good total payoff. The goal of the column player is to guide her opponent to pick a mixed strategy which is favourable for the system designer. Therefore, she needs to: (i) design an appropriate payoff matrix whose unique minimax solution contains the desired mixed strategy of the row player; and (ii) strategically interact with the row player during a sequence of plays in order to guide her opponent to converge to that desired behaviour. To design such a payoff matrix, we propose a novel solution that provably has a unique minimax solution with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications
