Reinforcement Learning with Continuous Actions Under Unmeasured   Confounding

Yuhan Li; Eugene Han; Yifan Hu; Wenzhuo Zhou; Zhengling Qi; Yifan Cui,; Ruoqing Zhu

arXiv:2505.00304·stat.ML·May 2, 2025

Reinforcement Learning with Continuous Actions Under Unmeasured Confounding

Yuhan Li, Eugene Han, Yifan Hu, Wenzhuo Zhou, Zhengling Qi, Yifan Cui,, Ruoqing Zhu

PDF

TL;DR

This paper develops a novel approach for offline reinforcement learning with continuous actions in the presence of unmeasured confounders, providing theoretical guarantees and practical algorithms for policy optimization.

Contribution

It introduces a new identification method and a minimax estimator for policy evaluation under unmeasured confounding in continuous action spaces, advancing the field.

Findings

01

The proposed estimator is consistent and has finite-sample error bounds.

02

The policy-gradient algorithm effectively identifies optimal policies.

03

Empirical results demonstrate improved policy performance in simulations and real data.

Abstract

This paper addresses the challenge of offline policy learning in reinforcement learning with continuous action spaces when unmeasured confounders are present. While most existing research focuses on policy evaluation within partially observable Markov decision processes (POMDPs) and assumes discrete action spaces, we advance this field by establishing a novel identification result to enable the nonparametric estimation of policy value for a given target policy under an infinite-horizon framework. Leveraging this identification, we develop a minimax estimator and introduce a policy-gradient-based algorithm to identify the in-class optimal policy that maximizes the estimated policy value. Furthermore, we provide theoretical results regarding the consistency, finite-sample error bound, and regret bound of the resulting optimal policy. Extensive simulations and a real-world application…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.