Lifelong Bandit Optimization: No Prior and No Regret
Felix Schur, Parnian Kassraie, Jonas Rothfuss, Andreas Krause

TL;DR
This paper introduces LIBO, a lifelong bandit optimization algorithm that learns shared kernels across tasks to improve sample efficiency and achieve oracle optimal regret, even without prior knowledge of the kernel.
Contribution
The paper presents LIBO, a novel meta-learning approach for kernelized bandits that adapts across tasks and guarantees sublinear lifelong regret without prior kernel knowledge.
Findings
LIBO achieves regret close to oracle optimal as tasks increase.
LIBO can be combined with any kernelized bandit algorithm for improved performance.
F-LIBO extends LIBO to federated settings without access to individual task data.
Abstract
Machine learning algorithms are often repeatedly applied to problems with similar structure over and over again. We focus on solving a sequence of bandit optimization tasks and develop LIBO, an algorithm which adapts to the environment by learning from past experience and becomes more sample-efficient in the process. We assume a kernelized structure where the kernel is unknown but shared across all tasks. LIBO sequentially meta-learns a kernel that approximates the true kernel and solves the incoming tasks with the latest kernel estimate. Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees oracle optimal performance, meaning that as more tasks are solved, the regret of LIBO on each task converges to the regret of the bandit algorithm with oracle knowledge of the true kernel. Naturally, if paired with a sublinear bandit algorithm, LIBO yields a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Age of Information Optimization
