Loading paper

arXiv:1107.1744·math.OC·October 11, 2011

Stochastic convex optimization with bandit feedback

Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander, Rakhlin

TL;DR

This paper introduces a generalized ellipsoid algorithm for stochastic convex optimization with bandit feedback, achieving optimal regret bounds of ()\u00b7 extasciitilde()()()()()()()()()()()()()()() regret, matching the lower bound and thus being optimal.

Contribution

The paper generalizes the ellipsoid algorithm to stochastic convex optimization with bandit feedback, achieving optimal regret bounds.

Findings

01

Achieves ()\u00b7 extasciitilde() regret bounds.

02

Provides a theoretically optimal algorithm for the problem.

03

Extends ellipsoid method to stochastic bandit setting.

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f (x)$ at any query point $x \in \xset$ . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\otil (\poly (d) T)$ regret. Since any algorithm has regret at least $Ω (T)$ on this problem, our algorithm is optimal in terms of the scaling with $T$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jw3479/exogenous_mdps
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Stochastic convex optimization with bandit feedback | Tomesphere