Multi-Point Bandit Algorithms for Nonstationary Online Nonconvex   Optimization

Abhishek Roy; Krishnakumar Balasubramanian; Saeed Ghadimi; Prasant; Mohapatra

arXiv:1907.13616·stat.ML·September 12, 2019·6 cites

Multi-Point Bandit Algorithms for Nonstationary Online Nonconvex Optimization

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant, Mohapatra

PDF

Open Access

TL;DR

This paper develops and analyzes bandit algorithms for nonstationary online nonconvex optimization, introducing novel regret measures based on stationary solutions and Hessian estimation, with applications to avoiding saddle points and structured nonconvex functions.

Contribution

It proposes new bandit algorithms for nonstationary nonconvex problems, including Hessian estimation via Gaussian Stein's identity and regret bounds for structured nonconvex functions.

Findings

01

Regret bounds for nonstationary nonconvex optimization using stationary solutions.

02

Hessian estimation in bandit setting via Gaussian Stein's identity.

03

Algorithms for avoiding saddle points in nonstationary bandit problems.

Abstract

Bandit algorithms have been predominantly analyzed in the convex setting with function-value based stationary regret as the performance measure. In this paper, motivated by online reinforcement learning problems, we propose and analyze bandit algorithms for both general and structured nonconvex problems with nonstationary (or dynamic) regret as the performance measure, in both stochastic and non-stochastic settings. First, for general nonconvex functions, we consider nonstationary versions of first-order and second-order stationary solutions as a regret measure, motivated by similar performance measures for offline nonconvex optimization. In the case of second-order stationary solution based regret, we propose and analyze online and bandit versions of the cubic regularized Newton's method. The bandit version is based on estimating the Hessian matrices in the bandit setting, based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Advanced Wireless Network Optimization