Offline Local Search for Online Stochastic Bandits
Gerdus Benad\`e, Rathish Das, Thomas Lavastida

TL;DR
This paper introduces a generic framework that converts offline local search algorithms into online stochastic bandit algorithms, achieving logarithmic regret in various combinatorial optimization problems.
Contribution
It presents a novel offline-to-online conversion method for local search algorithms, enabling efficient online decision-making with low regret.
Findings
Achieves $O( ext{log}^3 T)$ regret for online combinatorial bandits.
Applies framework to scheduling, matroid base, and clustering problems.
Demonstrates flexibility and improved regret bounds over existing methods.
Abstract
Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
