On the price of exact truthfulness in incentive-compatible online   learning with bandit feedback: A regret lower bound for WSU-UX

Ali Mortazavi; Junhao Lin; Nishant A. Mehta

arXiv:2404.05155·cs.LG·April 9, 2024·1 cites

On the price of exact truthfulness in incentive-compatible online learning with bandit feedback: A regret lower bound for WSU-UX

Ali Mortazavi, Junhao Lin, Nishant A. Mehta

PDF

Open Access

TL;DR

None

Contribution

None

Abstract

In one view of the classical game of prediction with expert advice with binary outcomes, in each round, each expert maintains an adversarially chosen belief and honestly reports this belief. We consider a recently introduced, strategic variant of this problem with selfish (reputation-seeking) experts, where each expert strategically reports in order to maximize their expected future reputation based on their belief. In this work, our goal is to design an algorithm for the selfish experts problem that is incentive-compatible (IC, or \emph{truthful}), meaning each expert's best strategy is to report truthfully, while also ensuring the algorithm enjoys sublinear regret with respect to the expert with the best belief. Freeman et al. (2020) recently studied this problem in the full information and bandit settings and obtained truthful, no-regret algorithms by leveraging prior work on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing