Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms
Ahmed Ben Yahmed (CREST, ENSAE Paris), Cl\'ement Calauz\`enes, Vianney, Perchet (CREST, ENSAE Paris)

TL;DR
This paper demonstrates that in strategic multi-armed bandit scenarios, communication among arms can sustain equilibria where arms retain value and cause high regret for the player, even with partial information sharing.
Contribution
It introduces a communication protocol enabling arms to share information truthfully, achieving equilibria similar to complete information settings.
Findings
Arms can maintain high value through strategic communication.
Shared information among arms leads to linear regret for the player.
Communication protocols can incentivize truthful information sharing.
Abstract
In the strategic multi-armed bandit setting, when arms possess perfect information about the player's behavior, they can establish an equilibrium where: 1. they retain almost all of their value, 2. they leave the player with a substantial (linear) regret. This study illustrates that, even if complete information is not publicly available to all arms but is shared among them, it is possible to achieve a similar equilibrium. The primary challenge lies in designing a communication protocol that incentivizes the arms to communicate truthfully.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
