Minimizing Regret of Bandit Online Optimization in Unconstrained Action   Spaces

Tatiana Tatarenko; Maryam Kamgarpour

arXiv:1806.05069·math.OC·May 5, 2020

Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Tatiana Tatarenko, Maryam Kamgarpour

PDF

Open Access

TL;DR

This paper introduces a new zero-order optimization algorithm for unconstrained online convex optimization that achieves near-optimal regret bounds, even without explicit gradient information, by using one-point and two-point feedback methods.

Contribution

The paper presents a novel gradient estimation algorithm for unconstrained online convex optimization with zero-order feedback, achieving optimal regret bounds.

Findings

01

Achieves regret of O(n^{2/3}T^{2/3}) with one-point feedback.

02

Adapts to two-point feedback achieving the lower bound of O(n^{1/2}T^{1/2}.

03

Algorithm is independent of problem parameters.

Abstract

We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the corresponding cost function evaluated at her chosen action (zero-order oracle). The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of a static optimal action had she known the sequence of cost functions a priori. We present a novel algorithm to minimize regret in unconstrained action spaces. Our algorithm hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values. The algorithm is independent of problem parameters. Letting $T$ denote the number of queries of the zero-order oracle and $n$ the problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques