Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator
Ramesh Johari, Tianyi Peng, Wenqian Xing

TL;DR
This paper introduces the Truncated Policy Gradient estimator for accurately estimating treatment effects in nonstationary dynamic systems, reducing bias and variance compared to existing methods.
Contribution
The paper proposes a novel TPG estimator that accounts for nonstationarity by using outcome trajectories, with theoretical guarantees and practical validation.
Findings
TPG estimator reduces bias and variance in nonstationary settings.
Theoretical proof of a central limit theorem for TPG.
Empirical validation on real-world case studies shows improved performance.
Abstract
Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy-gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life
