A Reinforcement Learning Approach to Estimating Long-term Treatment Effects
Ziyang Tang, Yiheng Duan, Stephanie Zhang, Lihong Li

TL;DR
This paper introduces a reinforcement learning method to estimate long-term treatment effects in scenarios where effects evolve over time, addressing limitations of traditional A/B testing in nonstationary environments.
Contribution
It develops a novel RL algorithm for nonstationary problems to estimate long-term treatment effects, demonstrated on synthetic and real-world datasets.
Findings
Promising results on synthetic datasets
Effective in online store data
Addresses nonstationary state transitions
Abstract
Randomized experiments (a.k.a. A/B tests) are a powerful tool for estimating treatment effects, to inform decisions making in business, healthcare and other applications. In many problems, the treatment has a lasting effect that evolves over time. A limitation with randomized experiments is that they do not easily extend to measure long-term effects, since running long experiments is time-consuming and expensive. In this paper, we take a reinforcement learning (RL) approach that estimates the average reward in a Markov process. Motivated by real-world scenarios where the observed state transition is nonstationary, we develop a new algorithm for a class of nonstationary problems, and demonstrate promising results in two synthetic datasets and one online store dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
