Incentivized Lipschitz Bandits
Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR
This paper introduces incentivized exploration algorithms for continuous-armed bandits that balance regret minimization and compensation costs, extending to contextual settings with theoretical guarantees and simulations.
Contribution
It presents novel algorithms for incentivized exploration in infinite and contextual bandits, achieving sublinear regret and compensation bounds in continuous metric spaces.
Findings
Algorithms achieve sublinear regret and compensation.
Theoretical bounds depend on the metric space dimension.
Results extend to contextual bandit scenarios.
Abstract
We study incentivized exploration in multi-armed bandit (MAB) settings with infinitely many arms modeled as elements in continuous metric spaces. Unlike classical bandit models, we consider scenarios where the decision-maker (principal) incentivizes myopic agents to explore beyond their greedy choices through compensation, but with the complication of reward drift--biased feedback arising due to the incentives. We propose novel incentivized exploration algorithms that discretize the infinite arm space uniformly and demonstrate that these algorithms simultaneously achieve sublinear cumulative regret and sublinear total compensation. Specifically, we derive regret and compensation bounds of , with representing the covering dimension of the metric space. Furthermore, we generalize our results to contextual bandits, achieving comparable performance guarantees. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
