Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

Haishan Ye

arXiv:2603.25029·cs.LG·April 7, 2026

Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

Haishan Ye

PDF

TL;DR

This paper establishes the first high-probability regret bounds for online convex optimization with two-point bandit feedback on strongly convex functions, achieving minimax optimality in dimension and time horizon.

Contribution

It provides the first high-probability regret bounds for strongly convex functions in two-point bandit feedback, resolving a longstanding open problem.

Findings

01

Achieved a regret bound of O(d(log T + log(1/δ))/μ) for strongly convex losses.

02

Bound is minimax optimal with respect to T and d.

03

Addresses the challenge of heavy-tailed gradient estimators in bandit settings.

Abstract

We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge and provide the first high-probability regret bound of $O (d (lo g T + lo g (1/ δ)) / μ)$ for $μ$ -strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.