Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema; Dirk van der Hoeven; Tor Lattimore; Jack J. Mayo

arXiv:2406.06506·math.OC·June 11, 2024

Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

PDF

Open Access

TL;DR

This paper presents an efficient zeroth-order bandit convex optimization algorithm with provable regret bounds in both adversarial and stochastic settings, advancing the theoretical understanding of high-dimensional online learning.

Contribution

It introduces a novel online Newton method tailored for bandit convex optimization, achieving improved regret bounds and computational efficiency.

Findings

01

Regret bound of $d^{3.5} \sqrt{n} ext{polylog}(n, d)$ in adversarial setting

02

Improved regret bound of $M d^{2} \sqrt{n} ext{polylog}(n, d)$ in stochastic setting

03

Algorithm is computationally efficient for high-dimensional problems

Abstract

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} n polylog (n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} n polylog (n, d)$ where $M \in [d^{- 1/2}, d^{- 1/4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Sparse and Compressive Sensing Techniques

MethodsSparse Evolutionary Training