Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback
Tianyu Wang, Lin F. Yang, Zizhuo Wang

TL;DR
This paper explores a novel multi-armed bandit problem where arms are graph nodes, and feedback is obtained through random walk trajectories, analyzing both stochastic and adversarial scenarios to understand fundamental limits.
Contribution
It introduces a new MAB framework with graph-based arms and random walk feedback, providing theoretical insights into its complexity and algorithm behaviors.
Findings
Problem is as hard as standard MAB in information theory
Random walk feedback does not simplify the problem
Analyzes bandit algorithms' behaviors in this setting
Abstract
In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks. We provide a comprehensive understanding of this problem by studying both the stochastic and the adversarial setting. We show that this problem is not easier than a standard MAB in an information theoretical sense, although additional information is available through random walk trajectories. Behaviors of bandit algorithms on this problem are also studied.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
