Incentivizing Exploration with Heterogeneous Value of Money
Li Han, David Kempe, Ruixin Qiang

TL;DR
This paper extends a model for incentivized exploration in multi-armed bandits to agents with heterogeneous and non-linear utilities for money, proposing a convex programming approach for optimal signaling policies.
Contribution
It introduces a convex program-based method for designing optimal, signal-dependent policies under heterogeneous agent utilities, improving worst-case guarantees.
Findings
Convex program derives optimal policies for heterogeneous utilities.
Worst-case guarantees are tight, matching 'Diamonds in the Rough' instances.
More informative signals lead to better approximation ratios.
Abstract
Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent's tradeoff…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
