Learning in Bayesian Stackelberg Games With Unknown Follower's Types
Matteo Bollini, Francesco Bacchiocchi, Samuel Coutts, Matteo Castiglioni, Alberto Marchesi

TL;DR
This paper addresses online learning in Bayesian Stackelberg games with unknown follower types, proposing algorithms that achieve sublinear regret when the leader observes the follower's type, a more realistic setting than previous models.
Contribution
It introduces the first algorithms for Bayesian Stackelberg games with unknown follower types under realistic feedback, achieving near-optimal regret bounds.
Findings
No-regret is impossible with only action feedback.
Type feedback allows for effective learning algorithms.
Proposed algorithm attains $ ilde{O}( oot{2}T)$ regret.
Abstract
We study online learning in Bayesian Stackelberg games, where a leader repeatedly interacts with a follower whose unknown private type is independently drawn at each round from an unknown probability distribution. The goal is to design algorithms that minimize the leader's regret with respect to always playing an optimal commitment computed with knowledge of the game. We consider, for the first time to the best of our knowledge, the most realistic case in which the leader does not know anything about the follower's types, i.e., the possible follower payoffs. This raises considerable additional challenges compared to the commonly studied case in which the payoffs of follower types are known. First, we prove a strong negative result: no-regret is unattainable under action feedback, i.e., when the leader only observes the follower's best response at the end of each round. Thus, we focus on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
