
TL;DR
This paper introduces an adaptive algorithm for noisy global optimization and continuum-armed bandits that achieves optimal regret bounds without prior information, by reducing problems to tree-armed bandits and adaptively combining multiple trees.
Contribution
It presents a novel adaptive algorithm for continuum-armed bandits that attains optimal regret bounds and introduces new results in the tree-armed bandit setting.
Findings
Achieves square-root regret in bandits and inverse-square-root error in optimization.
Effectively combines multiple trees to minimize regret.
Provides near-matching lower bounds on regret based on zooming dimension.
Abstract
We describe a novel algorithm for noisy global optimisation and continuum-armed bandits, with good convergence properties over any continuous reward function having finitely many polynomial maxima. Over such functions, our algorithm achieves square-root regret in bandits, and inverse-square-root error in optimisation, without prior information. Our algorithm works by reducing these problems to tree-armed bandits, and we also provide new results in this setting. We show it is possible to adaptively combine multiple trees so as to minimise the regret, and also give near-matching lower bounds on the regret in terms of the zooming dimension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
