Online Newton Method for Bandit Convex Optimisation
Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

TL;DR
This paper presents an efficient zeroth-order bandit convex optimization algorithm with provable regret bounds in both adversarial and stochastic settings, advancing the theoretical understanding of high-dimensional online learning.
Contribution
It introduces a novel online Newton method tailored for bandit convex optimization, achieving improved regret bounds and computational efficiency.
Findings
Regret bound of $d^{3.5} \sqrt{n} ext{polylog}(n, d)$ in adversarial setting
Improved regret bound of $M d^{2} \sqrt{n} ext{polylog}(n, d)$ in stochastic setting
Algorithm is computationally efficient for high-dimensional problems
Abstract
We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most with high probability where is the dimension and is the time horizon. In the stochastic setting the bound improves to where is a constant that depends on the geometry of the constraint set and the desired computational properties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Sparse and Compressive Sensing Techniques
MethodsSparse Evolutionary Training
