Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders

David Bruns-Smith; Angela Zhou

arXiv:2302.00662·stat.ML·October 30, 2025·1 cites

Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders

David Bruns-Smith, Angela Zhou

PDF

Open Access

TL;DR

This paper develops a robust fitted-Q-iteration method for offline reinforcement learning that accounts for unobserved confounders, providing theoretical guarantees and demonstrating effectiveness on healthcare data.

Contribution

It introduces an orthogonalized robust fitted-Q-iteration algorithm that handles unobserved confounders under a sensitivity model, with improved statistical properties and practical applicability.

Findings

01

Effective in simulations and healthcare data

02

Reduces dependence on quantile estimation error

03

Provides sample complexity bounds and theoretical insights

Abstract

Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy's action decisions are observed. Though this assumption, sequential ignorability/unconfoundedness, likely does not hold in observational data, most of the data that accounts for selection into treatment may be observed, motivating sensitivity analysis. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Statistical Methods in Clinical Trials