Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

Lars van der Laan; David Hubbard; Allen Tran; Nathan Kallus; Aur\'elien Bibaut

arXiv:2501.06926·stat.ML·November 14, 2025

Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aur\'elien Bibaut

PDF

Open Access

TL;DR

This paper introduces a semiparametric extension of Double Reinforcement Learning that relaxes overlap conditions and improves efficiency in long-term causal inference, avoiding complex density-ratio estimation.

Contribution

It develops doubly robust, automatic estimators for linear functionals of the Q-function in infinite-horizon MDPs, reducing high-dimensional challenges to a single dimension.

Findings

01

Achieves efficiency gains over nonparametric methods.

02

Reduces overlap condition to a single-dimensional requirement.

03

Avoids density-ratio estimation by using Q-function as a summary.

Abstract

Double Reinforcement Learning (DRL) enables efficient inference for policy values in nonparametric Markov decision processes (MDPs), but existing methods face two major obstacles: (1) they require stringent intertemporal overlap conditions on state trajectories, and (2) they rely on estimating high-dimensional occupancy density ratios. Motivated by problems in long-term causal inference, we extend DRL to a semiparametric setting and develop doubly robust, automatic estimators for general linear functionals of the Q-function in infinite-horizon, time-homogeneous MDPs. By imposing structure on the Q-function, we relax the overlap conditions required by nonparametric methods and obtain efficiency gains. The second obstacle--density-ratio estimation--typically requires computationally expensive and unstable min-max optimization. To address both challenges, we introduce superefficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference