Detecting an Odd Restless Markov Arm with a Trembling Hand
P. N. Karthik, Rajesh Sundaresan

TL;DR
This paper develops a theoretical framework for quickly identifying an odd Markov arm among multiple restless arms with a trembling hand, deriving asymptotic bounds and near-optimal strategies for the problem.
Contribution
It introduces the first asymptotic lower bound and near-optimal strategies for identifying an odd restless Markov arm with trembling hand errors.
Findings
Derived the first asymptotic lower bound on detection time.
Constructed strategies approaching the lower bound in vanishing error regime.
Extended analysis to restless Markov arms, unlike prior i.i.d. or rested models.
Abstract
In this paper, we consider a multi-armed bandit in which each arm is a Markov process evolving on a finite state space. The state space is common across the arms, and the arms are independent of each other. The transition probability matrix of one of the arms (the odd arm) is different from the common transition probability matrix of all the other arms. A decision maker, who knows these transition probability matrices, wishes to identify the odd arm as quickly as possible, while keeping the probability of decision error small. To do so, the decision maker collects observations from the arms by pulling the arms in a sequential manner, one at each discrete time instant. However, the decision maker has a trembling hand, and the arm that is actually pulled at any given time differs, with a small probability, from the one he intended to pull. The observation at any given time is the arm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
