Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms
Jibang Wu, Haifeng Xu, Fan Yao

TL;DR
This paper investigates the challenge of achieving rationalizability in general games through bandit learning algorithms, revealing exponential inefficiencies in many existing methods and proposing a new algorithm, Exp3-DH, that efficiently eliminates dominated actions.
Contribution
The paper introduces Exp3-DH, a novel bandit algorithm that efficiently converges to rationalizability in general games, overcoming exponential barriers faced by previous algorithms.
Findings
Many no regret algorithms take exponentially many rounds to reach rationalizability.
Algorithms with swap regret also suffer from exponential inefficiency.
Exp3-DH converges to rationalizability within polynomially many rounds in self-play.
Abstract
Under the uncoupled learning setup, the last-iterate convergence guarantee towards Nash equilibrium is shown to be impossible in many games. This work studies the last-iterate convergence guarantee in general games toward rationalizability, a key solution concept in epistemic game theory that relaxes the stringent belief assumptions in both Nash and correlated equilibrium. This learning task naturally generalizes best arm identification problems, due to the intrinsic connections between rationalizable action profiles and the elimination of iteratively dominated actions. Despite a seemingly simple task, our first main result is a surprisingly negative one; that is, a large and natural class of no regret algorithms, including the entire family of Dual Averaging algorithms, provably take exponentially many rounds to reach rationalizability. Moreover, algorithms with the stronger no swap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications
