Bandit learning in concave $N$-person games

Mario Bravo; David S. Leslie; Panayotis Mertikopoulos

arXiv:1810.01925·cs.GT·October 5, 2018·28 cites

Bandit learning in concave $N$-person games

Mario Bravo, David S. Leslie, Panayotis Mertikopoulos

PDF

Open Access

TL;DR

This paper investigates how bandit feedback-based no-regret learning algorithms behave in non-cooperative concave games, showing convergence to Nash equilibrium under certain conditions and providing convergence rate bounds.

Contribution

It demonstrates that mirror descent with bandit feedback converges to Nash equilibrium in concave games satisfying monotonicity, extending bandit learning theory to multi-agent settings.

Findings

01

Convergence to Nash equilibrium with probability 1 under monotonicity.

02

Derived upper bounds for convergence rates nearly matching single-agent optimization.

03

Highlighting conditions where no-regret learning stabilizes in multi-agent games.

Abstract

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability $1$ . We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Experimental Behavioral Economics Studies · Game Theory and Applications