Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu, Shuji Kijima

TL;DR
This paper analyzes the sample complexity of identifying nonredundant items in nontransitive dueling bandit problems, extending understanding beyond transitive relations to complex cyclic preferences like rock-paper-scissors.
Contribution
It introduces bounds on sample complexity for detecting indispensable items in nontransitive dueling bandits, a novel analysis for such cyclic preference relations.
Findings
Provides upper and lower bounds based on matrix determinants.
Characterizes the complexity of identifying nonredundant items.
Extends dueling bandit analysis to nontransitive, cyclic relations.
Abstract
Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all items are indispensable in a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Artificial Intelligence in Games
