A Graphical Approach to State Variable Selection in Off-policy Learning
Joakim Blach Andersen, Qingyuan Zhao

TL;DR
This paper introduces a graphical framework for identifying relevant variables in off-policy learning across various decision-making settings, bridging the gap between medical and reinforcement learning applications.
Contribution
It develops new graphical identification criteria based on causal inference theory that unify DTRs and RL, clarifying assumptions and limitations.
Findings
Graphical criteria help identify causal effects in off-policy learning.
Violations of criteria can lead to suboptimal policies.
Simulation demonstrates practical implications in logistics pricing.
Abstract
Sequential decision problems are widely studied across many areas of science. A key challenge when learning policies from historical data - a practice commonly referred to as off-policy learning - is how to ``identify'' the impact of a policy of interest when the observed data are not randomized. Off-policy learning has mainly been studied in two settings: dynamic treatment regimes (DTRs), where the focus is on controlling confounding in medical problems with short decision horizons, and offline reinforcement learning (RL), where the focus is on dimension reduction in closed systems such as games. The gap between these two well studied settings has limited the wider application of off-policy learning to many real-world problems. Using the theory for causal inference based on acyclic directed mixed graph (ADMGs), we provide a set of graphical identification criteria in general decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts
MethodsSparse Evolutionary Training · Causal inference · Focus
