Derivative-Free & Order-Robust Optimisation
Victor Gabillon, Rasul Tutunov, Michal Valko, Haitham Bou Ammar

TL;DR
This paper introduces Vroom, a zero'th order optimization algorithm designed for non-stationary and adversarial environments, achieving vanishing regret and addressing a rarely explored aspect of simple regret in online learning.
Contribution
It formalizes order-robust optimization as online learning and presents Vroom, the first algorithm targeting simple regret in adversarial settings with proven performance.
Findings
Vroom achieves vanishing regret in non-stationary environments.
It recovers favorable rates under stochastic reward processes.
Addresses a novel challenge in simple regret for adversarial scenarios.
Abstract
In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes. Our results are the first to target simple regret definitions in adversarial scenarios unveiling a challenge that has been rarely considered in prior work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization
