Bandit Convex Optimisation Revisited: FTRL Achieves $\tilde{O}(t^{1/2})$ Regret
David Young, Douglas Leith, George Iosifidis

TL;DR
This paper demonstrates that a kernel estimator can be adapted into a sampling-based bandit estimator, enabling the FTRL algorithm to achieve near-optimal regret bounds in adversarial convex optimization settings.
Contribution
It introduces a novel method to convert kernel estimators into bandit estimators and applies this to improve regret bounds for FTRL in bandit convex optimization.
Findings
Achieves O(t^{1/2}) regret with FTRL in bandit convex optimization.
Provides a simple conversion of kernel estimators into bandit estimators.
Enhances the theoretical understanding of bandit algorithms with kernel methods.
Abstract
We show that a kernel estimator using multiple function evaluations can be easily converted into a sampling-based bandit estimator with expectation equal to the original kernel estimate. Plugging such a bandit estimator into the standard FTRL algorithm yields a bandit convex optimisation algorithm that achieves regret against adversarial time-varying convex loss functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Sparse and Compressive Sensing Techniques
