Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Ahmed Ben Yahmed; Cl\'ement Calauz\`enes; Vianney Perchet

arXiv:2501.16018·cs.LG·January 28, 2025

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Ahmed Ben Yahmed, Cl\'ement Calauz\`enes, Vianney Perchet

PDF

Open Access 1 Video

TL;DR

This paper studies strategic multi-armed bandit problems where arms aim to maximize their utility by withholding rewards, and introduces a mechanism to ensure truthful reporting, enabling the agent to achieve near-optimal rewards with bounded regret.

Contribution

The paper proposes a novel mechanism that induces truthful behavior among strategic arms in multi-armed bandit settings, ensuring the agent can learn effectively despite strategic manipulation.

Findings

01

Mechanism guarantees truthful reward disclosure by arms.

02

Agent achieves the second-highest true reward with bounded regret.

03

Regret bounds are problem-dependent and worst-case, respectively.

Abstract

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a portion of its reward, and disclosing only a fraction of it to the learning agent. This scenario unfolds as a game over $T$ rounds, leading to a competition of objectives between the learning agent, aiming to minimize their regret, and the arms, motivated by the desire to maximize their individual utilities. To address these dynamics, we introduce a new mechanism that establishes an equilibrium wherein each arm behaves truthfully and discloses as much of its rewards as possible. With this mechanism, the agent can attain the second-highest average (true) reward among arms, with a cumulative regret bounded by $O (lo g (T) /Δ)$ (problem-dependent) or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications