Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness
I-han Lai, Stefan Wager

TL;DR
This paper introduces new methods for estimating the long-term effects of small policy changes in dynamic systems, using a doubly robust estimator under a general unconfoundedness assumption.
Contribution
It develops a tractable approach for identifying and estimating dynamic marginal policy effects without requiring full state observation or suffering from exponential horizon curse.
Findings
Proposed methods are practical and robust in simulations.
The approach works under a general sequential unconfoundedness assumption.
Demonstrated in a dynamic pricing application simulation.
Abstract
We develop methods for estimating how infinitesimal policy changes affect long-term outcomes in dynamic systems. We show that dynamic marginal policy effects (MPEs) can be identified via tractable reduced-form expressions, and can be estimated under a general sequential unconfoundedness assumption. We also propose a doubly robust estimator for dynamic MPEs. Our approach does not require observing full dynamic state information (as is typically assumed for off-policy evaluation in Markov decision processes), and does not incur an exponential curse of horizon (as is typical in non-Markovian off-policy evaluation). We demonstrate practicality and robustness of our approach in a number of simulations, including one motivated by a dynamic pricing application where people use past prices to form a reference level for current prices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
