Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Aliz\'ee Pace, Hugo Y\`eche, Bernhard Sch\"olkopf, Gunnar R\"atsch,, Guy Tennenholtz

TL;DR
This paper addresses hidden confounding in offline reinforcement learning by defining delphic uncertainty, proposing a method to estimate it, and developing a pessimistic algorithm that improves decision-making despite unobserved confounders.
Contribution
It introduces the concept of delphic uncertainty for nonidentifiable hidden confounding and develops a practical estimation method and a robust offline RL algorithm that mitigates confounding bias.
Findings
Effective in reducing confounding bias in experiments
Improves offline RL performance on health-related benchmarks
Demonstrates robustness to unobserved confounders
Abstract
A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which uses variation over world models compatible with the observations, and differentiate it from the well-known epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSepsis Diagnosis and Treatment · Hemodynamic Monitoring and Therapy · Advanced Causal Inference Techniques
