Collaborative Pure Exploration in Kernel Bandit
Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

TL;DR
This paper introduces a novel multi-agent kernel bandit framework with algorithms that optimize decision-making under limited communication, providing theoretical guarantees and empirical validation for improved learning efficiency.
Contribution
The paper formulates the CoPE-KB model, designs optimal algorithms with kernelized estimators, and establishes matching bounds demonstrating their efficiency and optimality in multi-task decision making.
Findings
Algorithms achieve computation and communication efficiency.
Theoretical bounds quantify task similarity effects.
Empirical results validate theoretical claims and show superior performance.
Abstract
In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e.g., recommendation systems and network scheduling. We consider two settings of CoPE-KB, i.e., Fixed-Confidence (FC) and Fixed-Budget (FB), and design two optimal algorithms CoopKernelFC (for FC) and CoopKernelFB (for FB). Our algorithms are equipped with innovative and efficient kernelized estimators to simultaneously achieve computation and communication efficiency. Matching upper and lower bounds under both the statistical and communication metrics are established to demonstrate the optimality of our algorithms. The theoretical bounds successfully quantify the influences of task similarities on learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
