Learning to Lead: Incentivizing Strategic Agents in the Dark
Yuchen Wu, Xinyi Zhong, Zhuoran Yang

TL;DR
This paper introduces a sample-efficient online learning algorithm for a principal interacting with a strategic agent with private information, ensuring near-optimal regret bounds in complex game-theoretic settings.
Contribution
It presents the first provably sample-efficient algorithm for learning optimal mechanisms in a dynamic principal-agent model with strategic, private-type agents.
Findings
Achieves near $ ilde{O}( oot{T}{})$ regret bound.
Develops a novel reward estimation framework using sector tests.
Introduces a delaying mechanism to incentivize myopic behavior.
Abstract
We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal's learning. The principal, observing only her own realized rewards and the agent's reported types, aims to learn an optimal coordination mechanism that minimizes strategic regret. We develop the first provably sample-efficient algorithm for this challenging setting. Our approach features a novel pipeline that combines (i) a delaying mechanism to incentivize approximately myopic agent behavior, (ii) an innovative reward angle estimation framework that uses sector tests and a matching procedure to recover type-dependent reward functions, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
