An optimal algorithm for bandit convex optimization

Elad Hazan; Yuanzhi Li

arXiv:1603.04350·cs.LG·March 16, 2016·25 cites

An optimal algorithm for bandit convex optimization

Elad Hazan, Yuanzhi Li

PDF

Open Access

TL;DR

This paper introduces a new algorithm for bandit convex optimization that achieves near-optimal regret bounds by applying the ellipsoid method, advancing the theoretical understanding of online learning with limited feedback.

Contribution

It presents the first $ ilde{O}( oot{T})$-regret algorithm for bandit convex optimization using a novel ellipsoid-based approach, improving upon prior methods.

Findings

01

Achieves near-optimal regret bounds up to logarithmic factors.

02

Introduces new tools in discrete convex geometry.

03

Demonstrates the effectiveness of the ellipsoid method in online learning.

Abstract

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O} (T)$ -regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms