Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting
Ahmed Ben Yahmed, Cl\'ement Calauz\`enes, Vianney Perchet

TL;DR
This paper studies strategic multi-armed bandit problems where arms aim to maximize their utility by withholding rewards, and introduces a mechanism to ensure truthful reporting, enabling the agent to achieve near-optimal rewards with bounded regret.
Contribution
The paper proposes a novel mechanism that induces truthful behavior among strategic arms in multi-armed bandit settings, ensuring the agent can learn effectively despite strategic manipulation.
Findings
Mechanism guarantees truthful reward disclosure by arms.
Agent achieves the second-highest true reward with bounded regret.
Regret bounds are problem-dependent and worst-case, respectively.
Abstract
We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a portion of its reward, and disclosing only a fraction of it to the learning agent. This scenario unfolds as a game over rounds, leading to a competition of objectives between the learning agent, aiming to minimize their regret, and the arms, motivated by the desire to maximize their individual utilities. To address these dynamics, we introduce a new mechanism that establishes an equilibrium wherein each arm behaves truthfully and discloses as much of its rewards as possible. With this mechanism, the agent can attain the second-highest average (true) reward among arms, with a cumulative regret bounded by (problem-dependent) or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
