Uncoupled Bandit Learning towards Rationalizability: Benchmarks,   Barriers, and Algorithms

Jibang Wu; Haifeng Xu; Fan Yao

arXiv:2111.05486·cs.GT·December 27, 2023

Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

Jibang Wu, Haifeng Xu, Fan Yao

PDF

Open Access

TL;DR

This paper investigates the challenge of achieving rationalizability in general games through bandit learning algorithms, revealing exponential inefficiencies in many existing methods and proposing a new algorithm, Exp3-DH, that efficiently eliminates dominated actions.

Contribution

The paper introduces Exp3-DH, a novel bandit algorithm that efficiently converges to rationalizability in general games, overcoming exponential barriers faced by previous algorithms.

Findings

01

Many no regret algorithms take exponentially many rounds to reach rationalizability.

02

Algorithms with swap regret also suffer from exponential inefficiency.

03

Exp3-DH converges to rationalizability within polynomially many rounds in self-play.

Abstract

Under the uncoupled learning setup, the last-iterate convergence guarantee towards Nash equilibrium is shown to be impossible in many games. This work studies the last-iterate convergence guarantee in general games toward rationalizability, a key solution concept in epistemic game theory that relaxes the stringent belief assumptions in both Nash and correlated equilibrium. This learning task naturally generalizes best arm identification problems, due to the intrinsic connections between rationalizable action profiles and the elimination of iteratively dominated actions. Despite a seemingly simple task, our first main result is a surprisingly negative one; that is, a large and natural class of no regret algorithms, including the entire family of Dual Averaging algorithms, provably take exponentially many rounds to reach rationalizability. Moreover, algorithms with the stronger no swap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications