Pure Exploration under Mediators' Feedback
Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

TL;DR
This paper introduces a new framework for best-arm identification in stochastic bandits where mediators, not the learner, select arms, and develops strategies that are nearly optimal even when mediator policies are unknown.
Contribution
It generalizes classical bandit problems to include mediators' feedback, providing lower bounds and optimal algorithms for both known and unknown mediator policies.
Findings
Derived a lower bound on sample complexity for BAI-MF.
Proposed an algorithm that matches the lower bound when mediator policies are known.
Extended the results to unknown mediator policies with comparable performance.
Abstract
Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a stochastic reward. Within the context of best-arm identification (BAI) problems, the goal of the agent lies in finding the optimal arm, i.e., the one with highest expected reward, as accurately and efficiently as possible. Nevertheless, the sequential interaction protocol of classical BAI problems, where the agent has complete control over the arm being pulled at each round, does not effectively model several decision-making problems of interest (e.g., off-policy learning, partially controllable environments, and human feedback). For this reason, in this work, we propose a novel strict generalization of the classical BAI problem that we refer to as best-arm identification under mediators' feedback (BAI-MF). More specifically, we consider…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Distributed Sensor Networks and Detection Algorithms
