TL;DR
This paper introduces a novel kernel-based algorithm for adversarial convex bandit problems, achieving improved regret bounds and polynomial-time complexity, advancing the state-of-the-art in derivative-free optimization.
Contribution
The paper presents the first polynomial-time algorithm with sublinear regret for adversarial convex bandit problems, using new kernel methods and annealing schedules.
Findings
Achieves $ ilde{O}(n^{9.5} \sqrt{T})$-regret with polynomial time
A variant runs in polynomial time with additional regret factors
Improves upon previous regret and time complexity bounds
Abstract
We consider the adversarial convex bandit problem and we build the first -time algorithm with -regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves -regret, and we show that a simple variant of this algorithm can be run in -time per step at the cost of an additional factor in the regret. These results improve upon the -regret and -time result of the first two authors, and the -regret and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Kernel-Based Methods for Bandit Convex Optimization· youtube
