Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
Mengxin Yu, Zhuoran Yang, Jianqing Fan

TL;DR
This paper introduces PLAN, an offline reinforcement learning algorithm for strategic MDPs with private types, achieving provably near-optimal policies by leveraging instrumental variables and pessimism under information asymmetry.
Contribution
It proposes a novel algorithm, PLAN, that effectively handles private types and information asymmetry in strategic MDPs using instrumental variables and pessimism principles.
Findings
PLAN achieves a $1/\sqrt{K}$-optimal policy under partial coverage.
The framework applies to strategic regression, bandits, and recommendation systems.
The algorithm leverages principal's actions as valid instrumental variables.
Abstract
We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Experimental Behavioral Economics Studies
