Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation
Tor Lattimore

TL;DR
This paper improves the theoretical upper bound on the minimax regret for zeroth-order adversarial bandit convex optimization, reducing the dependence on dimension and logarithmic factors.
Contribution
It introduces an improved exploratory distribution for convex functions, leading to a tighter regret bound compared to previous work.
Findings
Reduced the regret bound from $O(d^{9.5} ext{log}(n)^{7.5})$ to $O(d^{2.5} ext{sqrt}(n) ext{log}(n))$
Demonstrated the effectiveness of the new exploratory distribution in convex bandit optimization
Provided a novel proof technique based on information-theoretic analysis
Abstract
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most , where is the dimension and is the number of interactions. This improves on by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
