TL;DR
This paper introduces a generalized ellipsoid algorithm for stochastic convex optimization with bandit feedback, achieving optimal regret bounds of ()\u00b7 extasciitilde()()()()()()()()()()()()()()() regret, matching the lower bound and thus being optimal.
Contribution
The paper generalizes the ellipsoid algorithm to stochastic convex optimization with bandit feedback, achieving optimal regret bounds.
Findings
Achieves ()\u00b7 extasciitilde() regret bounds.
Provides a theoretically optimal algorithm for the problem.
Extends ellipsoid method to stochastic bandit setting.
Abstract
This paper addresses the problem of minimizing a convex, Lipschitz function over a convex, compact set under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value at any query point . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs regret. Since any algorithm has regret at least on this problem, our algorithm is optimal in terms of the scaling with .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
