Multi-Armed Bandits With Best-Action Queries

Francesco Bacchiocchi; Matteo Castiglioni; Alberto Marchesi; Francesco Emanuele Stradi

arXiv:2605.08287·cs.LG·May 12, 2026

Multi-Armed Bandits With Best-Action Queries

Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Francesco Emanuele Stradi

PDF

TL;DR

This paper investigates the impact of best-action queries on multi-armed bandit problems under bandit feedback, providing a complete characterization of their benefits in stochastic and adversarial settings.

Contribution

It resolves whether best-action queries improve regret bounds in bandit feedback, showing they do in stochastic i.i.d. rewards but not in correlated rewards.

Findings

01

Best-action queries reduce regret to (((T/k,\,((T-k))) in stochastic i.i.d. rewards.

02

Any algorithm faces a ((((T-k))) regret lower bound in correlated rewards.

03

Full-feedback results do not extend to bandit feedback in correlated reward settings.

Abstract

We study \emph{multi-armed bandits} (MABs) augmented with \emph{best-action queries}, in which the learner may additionally query an oracle that reveals the best arm in the current round. This setting was recently characterized by Russo et al. [2024] in the \emph{full-feedback} model, where the learner observes the rewards of all arms after each round. They show that, in both \emph{stochastic} and \emph{adversarial} environments, $k$ best-action queries reduce the optimal $O (T)$ regret to $O (min {T / k, T})$ . Whether this improvement extends to the more realistic \emph{bandit-feedback} model -- where the learner observes only the reward of the played arm -- was left as an open problem. We fully resolve this question. When rewards are stochastic but correlated among arms, we show that the full-feedback result does not carry over:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.