Conformal-Style Quantile Analyses for Stochastic Bandits

Chengyu Du; Mengfan Xu

arXiv:2605.07115·cs.LG·May 11, 2026

Conformal-Style Quantile Analyses for Stochastic Bandits

Chengyu Du, Mengfan Xu

PDF

TL;DR

This paper introduces ACP-UCB1, a novel conformal-style bandit algorithm that optimizes for upper-tail performance, providing logarithmic regret bounds and outperforming classical UCB1 in numerical tests.

Contribution

The paper proposes a new conformal-style policy, ACP-UCB1, for stochastic bandits targeting upper-tail metrics, with theoretical regret guarantees and empirical validation.

Findings

01

ACP-UCB1 achieves logarithmic upper-quantile regret.

02

It outperforms UCB1 in numerical experiments.

03

Provides a new approach for tail-focused bandit analysis.

Abstract

Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-\alpha/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conformal-style policy that combines an adaptive conformal estimate of the upper endpoint with a UCB-type optimism bonus. The technical challenge is that the conformity scores used by ACP-UCB1 are recomputed from evolving empirical quantile estimates and evaluated at an adaptive level. We control this endpoint through reward-quantile concentration, a perturbation argument for recomputed score quantiles,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.