Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback
Chang He, Bo Jiang, Shuzhong Zhang

TL;DR
This paper introduces an optimal algorithm for non-stationary bandit convex optimization with two-point feedback, achieving near-optimal dynamic regret bounds in Euclidean and non-Euclidean settings, including the simplex and cross-polytope.
Contribution
It extends bandit mirror descent to non-stationary environments with two-point feedback, providing nearly optimal regret bounds in various geometric settings.
Findings
Achieves optimal regret bounds in Euclidean space matching previous lower bounds.
Extends to non-Euclidean settings like the simplex and cross-polytope with near-optimal bounds.
Improves upon previous work by a factor of in Euclidean space.
Abstract
This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to non-Euclidean settings. Let be the total number of iterations and the path variation with respect to the -norm. In Euclidean space, our algorithm matches the optimal regret bound , improving upon \citet{zhao2021bandit} by a factor of . Beyond Euclidean settings, our algorithm achieves an upper bound of on the simplex, which is nearly optimal up to log factors. For the cross-polytope, the bound reduces to for some .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
