Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Tor Lattimore

arXiv:2006.00475·math.OC·September 28, 2020

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Tor Lattimore

PDF

TL;DR

This paper improves the theoretical upper bound on the minimax regret for zeroth-order adversarial bandit convex optimization, reducing the dependence on dimension and logarithmic factors.

Contribution

It introduces an improved exploratory distribution for convex functions, leading to a tighter regret bound compared to previous work.

Findings

01

Reduced the regret bound from $O(d^{9.5} ext{log}(n)^{7.5})$ to $O(d^{2.5} ext{sqrt}(n) ext{log}(n))$

02

Demonstrated the effectiveness of the new exploratory distribution in convex bandit optimization

03

Provided a novel proof technique based on information-theoretic analysis

Abstract

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O (d^{2.5} n lo g (n))$ , where $d$ is the dimension and $n$ is the number of interactions. This improves on $O (d^{9.5} n lo g (n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.