Incentivizing Exploration with Linear Contexts and Combinatorial Actions
Mark Sellke

TL;DR
This paper extends incentivized bandit exploration to linear contexts with combinatorial actions, proposing a convexity condition for incentive compatibility and improving initial sample complexity in semibandits.
Contribution
It introduces a convexity-based condition for incentive compatibility in linear bandits and enhances initial data collection efficiency in semibandit models.
Findings
Incentive compatibility achieved under convexity condition for linear bandits.
Improved sample complexity for initial data collection in semibandit models.
Extension of incentive-compatible exploration to high-dimensional action spaces.
Abstract
We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing
