Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context

Mohammad Shahverdikondori; Amir Mohammad Abouei; Alireza Rezaeimoghadam; Negar Kiyavash

arXiv:2502.03061·cs.LG·May 13, 2026

Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context

Mohammad Shahverdikondori, Amir Mohammad Abouei, Alireza Rezaeimoghadam, Negar Kiyavash

PDF

TL;DR

This paper explores best arm identification in stochastic bandits with post-action context, proposing optimal algorithms that leverage this context to improve decision accuracy over existing methods.

Contribution

It introduces a new BAI problem with post-action context, derives lower bounds, and develops algorithms that asymptotically achieve optimal sample complexity.

Findings

01

Algorithms utilizing post-action context outperform those ignoring it.

02

G-tracking algorithm effectively uses context geometry in separator setting.

03

Extended Track-and-Stop algorithm achieves optimality in non-separator setting.

Abstract

We introduce the problem of best arm identification (BAI) with post-action context, a new BAI problem in a stochastic multi-armed bandit environment and the fixed-confidence setting. The problem addresses the scenarios in which the learner receives a post-action context in addition to the reward after playing each action. This post-action context provides additional information that can significantly facilitate the decision process. We analyze two different types of the post-action context: (i) separator, where the reward depends solely on the context, and (ii) non-separator, where the reward depends on both the action and the context. For both cases, we derive instance-dependent lower bounds on the sample complexity and propose algorithms that asymptotically achieve the optimal sample complexity. For the separator setting, we propose a novel sampling rule called G-tracking, which uses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.