Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference
Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aur\'elien Bibaut

TL;DR
This paper introduces a semiparametric extension of Double Reinforcement Learning that relaxes overlap conditions and improves efficiency in long-term causal inference, avoiding complex density-ratio estimation.
Contribution
It develops doubly robust, automatic estimators for linear functionals of the Q-function in infinite-horizon MDPs, reducing high-dimensional challenges to a single dimension.
Findings
Achieves efficiency gains over nonparametric methods.
Reduces overlap condition to a single-dimensional requirement.
Avoids density-ratio estimation by using Q-function as a summary.
Abstract
Double Reinforcement Learning (DRL) enables efficient inference for policy values in nonparametric Markov decision processes (MDPs), but existing methods face two major obstacles: (1) they require stringent intertemporal overlap conditions on state trajectories, and (2) they rely on estimating high-dimensional occupancy density ratios. Motivated by problems in long-term causal inference, we extend DRL to a semiparametric setting and develop doubly robust, automatic estimators for general linear functionals of the Q-function in infinite-horizon, time-homogeneous MDPs. By imposing structure on the Q-function, we relax the overlap conditions required by nonparametric methods and obtain efficiency gains. The second obstacle--density-ratio estimation--typically requires computationally expensive and unstable min-max optimization. To address both challenges, we introduce superefficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
