Algorithm Selection as a Bandit Problem with Unbounded Losses
Matteo Gagliolo, Juergen Schmidhuber

TL;DR
This paper introduces a new bandit-based framework for algorithm selection that handles unbounded and unknown losses, providing theoretical regret bounds and preliminary experimental validation.
Contribution
It proposes a simplified bandit model for algorithm selection with unbounded losses and adapts an existing solver, with proven regret bounds and initial experimental results.
Findings
Proven regret bounds for the adapted bandit algorithm.
Successful preliminary experiments on SAT solvers.
Framework effectively manages unbounded, unknown losses.
Abstract
Algorithm selection is typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a bandit problem with expert advice, using an existing solver for this game, but this required the setting of an arbitrary bound on algorithm runtimes, thus invalidating the optimal regret of the solver. In this paper, we propose a simpler framework for representing algorithm selection as a bandit problem, with partial information, and an unknown bound on losses. We adapt an existing solver to this game, proving a bound on its expected regret, which holds also for the resulting algorithm selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
