Learning to Persuade a Biased Receiver
Yuqi Pan, Sadie Zhao, Milind Tambe, Yiling Chen

TL;DR
This paper develops a learning algorithm for a sender to effectively persuade a receiver with biased belief updates in repeated interactions, achieving near-optimal regret bounds.
Contribution
It introduces a safe exploration algorithm that learns the receiver's bias while maintaining high persuasion value in a complex, repeated information design setting.
Findings
Achieves $O(\log\log T)$ regret in learning the receiver's bias.
Proves a matching lower bound of $\Omega(\log\log T)$, confirming optimality.
Extends to settings with unknown prior, bias, and time-varying utilities.
Abstract
We study a repeated information design setting in which the receiver, who is also the decision-maker, updates beliefs in a systematically biased way. More specifically, a distorted posterior in our model can be written as a convex combination of the prior and the Bayesian posterior, governed by a fixed but unknown parameter. Over repeated interactions, the sender chooses persuasive signaling schemes, observes only the receiver's realized actions, and seeks to minimize regret relative to a full-information oracle that knows the receiver's biased updating rule. We propose a safe exploration algorithm for learning the receiver's bias while maintaining high persuasion value. The algorithm exploits the asymmetric cost of probing: conservative probes incur only local loss, whereas overly aggressive probes may lose the persuasive opportunity entirely. For general finite state and action spaces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
