Comparator-adaptive Convex Bandits
Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

TL;DR
This paper introduces comparator-adaptive algorithms for convex bandit optimization that achieve low regret when the comparator's norm is small, extending techniques from full-information to bandit settings.
Contribution
It develops the first comparator-adaptive convex bandit algorithms using new gradient estimators and surrogate losses, bridging a gap from full-information to bandit scenarios.
Findings
Regret bounds adapt to comparator norm
New single-point gradient estimator for convex bandits
Extension from linear to convex bandit settings
Abstract
We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Reinforcement Learning in Robotics
