Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

Maria-Florina Balcan; Martino Bernasconi; Matteo Castiglioni; Andrea Celli; Keegan Harris; Zhiwei Steven Wu

arXiv:2502.00204·cs.LG·May 5, 2026

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, Zhiwei Steven Wu

PDF

1 Video

TL;DR

This paper introduces improved algorithms for online learning in Stackelberg games with side information, achieving lower regret rates and extending to unknown utility functions and auction settings.

Contribution

The authors develop new algorithms that reduce regret to $O(T^{1/2})$ in Stackelberg games with bandit feedback, extending to unknown utilities and auction applications.

Findings

01

Achieved $O(T^{1/2})$ regret in bandit feedback Stackelberg games.

02

Extended algorithms to unknown utility functions.

03

Empirically outperformed previous methods in simulations.

Abstract

We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O (T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O (T^{2/3})$ . Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information· slideslive