Comparator-adaptive Convex Bandits

Dirk van der Hoeven; Ashok Cutkosky; Haipeng Luo

arXiv:2007.08448·cs.LG·July 17, 2020

Comparator-adaptive Convex Bandits

Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

PDF

Open Access 1 Video

TL;DR

This paper introduces comparator-adaptive algorithms for convex bandit optimization that achieve low regret when the comparator's norm is small, extending techniques from full-information to bandit settings.

Contribution

It develops the first comparator-adaptive convex bandit algorithms using new gradient estimators and surrogate losses, bridging a gap from full-information to bandit scenarios.

Findings

01

Regret bounds adapt to comparator norm

02

New single-point gradient estimator for convex bandits

03

Extension from linear to convex bandit settings

Abstract

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Comparator-Adaptive Convex Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Reinforcement Learning in Robotics