Conformal-Style Quantile Analyses for Stochastic Bandits
Chengyu Du, Mengfan Xu

TL;DR
This paper introduces ACP-UCB1, a novel conformal-style bandit algorithm that optimizes for upper-tail performance, providing logarithmic regret bounds and outperforming classical UCB1 in numerical tests.
Contribution
The paper proposes a new conformal-style policy, ACP-UCB1, for stochastic bandits targeting upper-tail metrics, with theoretical regret guarantees and empirical validation.
Findings
ACP-UCB1 achieves logarithmic upper-quantile regret.
It outperforms UCB1 in numerical experiments.
Provides a new approach for tail-focused bandit analysis.
Abstract
Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-\alpha/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conformal-style policy that combines an adaptive conformal estimate of the upper endpoint with a UCB-type optimism bonus. The technical challenge is that the conformity scores used by ACP-UCB1 are recomputed from evolving empirical quantile estimates and evaluated at an adaptive level. We control this endpoint through reward-quantile concentration, a perturbation argument for recomputed score quantiles,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
