Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
David Bruns-Smith, Angela Zhou

TL;DR
This paper develops a robust fitted-Q-iteration method for offline reinforcement learning that accounts for unobserved confounders, providing theoretical guarantees and demonstrating effectiveness on healthcare data.
Contribution
It introduces an orthogonalized robust fitted-Q-iteration algorithm that handles unobserved confounders under a sensitivity model, with improved statistical properties and practical applicability.
Findings
Effective in simulations and healthcare data
Reduces dependence on quantile estimation error
Provides sample complexity bounds and theoretical insights
Abstract
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy's action decisions are observed. Though this assumption, sequential ignorability/unconfoundedness, likely does not hold in observational data, most of the data that accounts for selection into treatment may be observed, motivating sensitivity analysis. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Statistical Methods in Clinical Trials
