TL;DR
This paper introduces Markov persuasion processes for sequential information design, providing efficient algorithms for optimal signaling policies and extending to reinforcement learning with provable regret bounds.
Contribution
It proposes a novel Markov persuasion process model, develops an efficient RL algorithm with sublinear regret, and extends the approach to large state and outcome spaces using function approximation.
Findings
Efficient algorithms for optimal signaling in MPPs.
Sublinear regret bounds for the RL algorithm.
Successful application to large state and outcome spaces.
Abstract
In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers. This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Nevertheless, in the population level where the model is known, it turns out that we can efficiently determine the optimal (resp.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning· youtube
Taxonomy
Methodstravel james
