Online non-convex optimization with imperfect feedback
Am\'elie H\'eliou, Matthieu Martin, Panayotis Mertikopoulos and, Thibaud Rahier

TL;DR
This paper develops a new online learning algorithm for non-convex losses with imperfect feedback, providing tight regret guarantees and using kernel-based estimators to handle limited information.
Contribution
It introduces a mixed-strategy learning policy with regret bounds for non-convex online learning under inexact feedback, extending to cases with only realized losses.
Findings
Derived tight regret bounds for static and dynamic policies.
Introduced a kernel-based estimator for inexact loss modeling.
Applied the framework to scenarios with limited feedback.
Abstract
We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's static (external) regret, as well as the regret incurred against the best dynamic policy in hindsight. Subsequently, we apply this general template to the case where the learner only has access to the actual loss incurred at each stage of the process. This is achieved by means of a kernel-based estimator which generates an inexact model for each round's loss function using only the learner's realized losses as input.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
