Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context
Mohammad Shahverdikondori, Amir Mohammad Abouei, Alireza Rezaeimoghadam, Negar Kiyavash

TL;DR
This paper explores best arm identification in stochastic bandits with post-action context, proposing optimal algorithms that leverage this context to improve decision accuracy over existing methods.
Contribution
It introduces a new BAI problem with post-action context, derives lower bounds, and develops algorithms that asymptotically achieve optimal sample complexity.
Findings
Algorithms utilizing post-action context outperform those ignoring it.
G-tracking algorithm effectively uses context geometry in separator setting.
Extended Track-and-Stop algorithm achieves optimality in non-separator setting.
Abstract
We introduce the problem of best arm identification (BAI) with post-action context, a new BAI problem in a stochastic multi-armed bandit environment and the fixed-confidence setting. The problem addresses the scenarios in which the learner receives a post-action context in addition to the reward after playing each action. This post-action context provides additional information that can significantly facilitate the decision process. We analyze two different types of the post-action context: (i) separator, where the reward depends solely on the context, and (ii) non-separator, where the reward depends on both the action and the context. For both cases, we derive instance-dependent lower bounds on the sample complexity and propose algorithms that asymptotically achieve the optimal sample complexity. For the separator setting, we propose a novel sampling rule called G-tracking, which uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
