Bandit Convex Optimization with Gradient Prediction Adaptivity
Shuche Wang, Adarsh Barik, Vincent Y. F. Tan

TL;DR
This paper introduces a new gradient prediction method for bandit convex optimization that adapts to prediction accuracy, achieving near-optimal regret bounds in both stationary and non-stationary environments.
Contribution
It proposes TP-VR-OPT, a variance-reduced gradient algorithm for two-point feedback, with regret bounds that depend on prediction error and are nearly optimal.
Findings
Regret bounds scale with the square root of the prediction error and dimension.
A lower bound matches the upper bound up to a factor of .
Adaptive algorithms eliminate the need for prior knowledge of prediction error or horizon.
Abstract
Bandit convex optimization (BCO) is a fundamental online learning framework with partial feedback, where the learner observes only the loss incurred at the chosen decision point in each round. In this work, we investigate whether optimistic gradient predictions can improve worst-case regret guarantees in a prediction-adaptive manner. Specifically, given gradient predictions , we seek regret bounds that scale with the cumulative prediction error We first establish a negative result: under the single-point feedback protocol, an unavoidable regret lower bound persists even when , showing that the variance of gradient estimation fundamentally obscures the benefit of accurate predictions. To overcome this barrier, we propose \emph{Two-Point Variance-Reduced Optimistic Gradient Descent} (TP-VR-OPT) for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
