Towards Fundamental Limits of Multi-armed Bandits with Random Walk   Feedback

Tianyu Wang; Lin F. Yang; Zizhuo Wang

arXiv:2011.01445·cs.LG·June 28, 2022

Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback

Tianyu Wang, Lin F. Yang, Zizhuo Wang

PDF

Open Access

TL;DR

This paper explores a novel multi-armed bandit problem where arms are graph nodes, and feedback is obtained through random walk trajectories, analyzing both stochastic and adversarial scenarios to understand fundamental limits.

Contribution

It introduces a new MAB framework with graph-based arms and random walk feedback, providing theoretical insights into its complexity and algorithm behaviors.

Findings

01

Problem is as hard as standard MAB in information theory

02

Random walk feedback does not simplify the problem

03

Analyzes bandit algorithms' behaviors in this setting

Abstract

In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks. We provide a comprehensive understanding of this problem by studying both the stochastic and the adversarial setting. We show that this problem is not easier than a standard MAB in an information theoretical sense, although additional information is available through random walk trajectories. Behaviors of bandit algorithms on this problem are also studied.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms