Post Reinforcement Learning Inference
Vasilis Syrgkanis, Ruohan Zhan

TL;DR
This paper develops a new weighted GMM method for valid inference of structural parameters in reinforcement learning data, addressing challenges from adaptive data collection and nonstationary policies.
Contribution
It introduces a weighted GMM approach with adaptive weights to achieve asymptotic normality in RL inference, a significant advancement over standard estimators.
Findings
Proposes a weighted GMM estimator for RL data.
Ensures asymptotic normality under adaptive data collection.
Enables valid hypothesis testing and confidence intervals.
Abstract
We study estimation and inference using data collected by reinforcement learning (RL) algorithms. These algorithms adaptively experiment by interacting with individual units over multiple stages, updating their strategies based on past outcomes. Our goal is to evaluate a counterfactual policy after data collection and estimate structural parameters, such as dynamic treatment effects, that support credit assignment and quantify the impact of early actions on final outcomes. These parameters can often be defined as solutions to moment equations, motivating moment-based estimation methods developed for static data. In RL settings, however, data are often collected adaptively under nonstationary behavior policies. As a result, standard estimators fail to achieve asymptotic normality due to time-varying variance. We propose a weighted generalized method of moments (GMM) approach that uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods in Clinical Trials
Methodsfail
