Best arm identification in multi-armed bandits with delayed feedback

Aditya Grover; Todor Markov; Peter Attia; Norman Jin; Nicholas; Perkins; Bryan Cheong; Michael Chen; Zi Yang; Stephen Harris; William Chueh,; Stefano Ermon

arXiv:1803.10937·cs.LG·March 30, 2018·22 cites

Best arm identification in multi-armed bandits with delayed feedback

Aditya Grover, Todor Markov, Peter Attia, Norman Jin, Nicholas, Perkins, Bryan Cheong, Michael Chen, Zi Yang, Stephen Harris, William Chueh,, Stefano Ermon

PDF

Open Access

TL;DR

This paper extends best arm identification in stochastic multi-armed bandits to include delayed feedback, proposing a framework that leverages partial feedback to improve efficiency in sequential and parallel settings.

Contribution

It introduces a general framework for modeling partial and delayed feedback, along with efficient algorithms for biased and unbiased estimators, applicable to parallel bandit scenarios.

Findings

01

Exploiting partial feedback improves sample efficiency.

02

Algorithms outperform baselines in real-world policy search tasks.

03

Parallel algorithms effectively handle batch arm selections.

Abstract

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework to model the relationship between partial and delayed feedback, and as a special case we introduce efficient algorithms for settings where the partial feedback are biased or unbiased estimators of the delayed feedback. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Our experiments in real-world settings, involving policy search and hyperparameter optimization in computational sustainability domains for fast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Distributed Sensor Networks and Detection Algorithms