Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning
Shuguang Yu, Shuxing Fang, Ruixin Peng, Zhengling Qi, Fan Zhou,, Chengchun Shi

TL;DR
This paper introduces a novel two-way deconfounder method for off-policy evaluation in causal reinforcement learning, effectively addressing unmeasured confounders by modeling system dynamics with neural tensor networks.
Contribution
It proposes a two-way unmeasured confounding assumption and develops a neural tensor network-based algorithm for consistent policy value estimation.
Findings
The estimator is theoretically consistent.
Numerical experiments demonstrate improved accuracy.
The method effectively handles unmeasured confounders.
Abstract
This paper studies off-policy evaluation (OPE) in the presence of unmeasured confounders. Inspired by the two-way fixed effects regression model widely used in the panel data literature, we propose a two-way unmeasured confounding assumption to model the system dynamics in causal reinforcement learning and develop a two-way deconfounder algorithm that devises a neural tensor network to simultaneously learn both the unmeasured confounders and the system dynamics, based on which a model-based estimator can be constructed for consistent policy value estimation. We illustrate the effectiveness of the proposed estimator through theoretical results and numerical experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques
