Causal Deep Reinforcement Learning Using Observational Data
Wenxuan Zhu, Chao Yu, Qiang Zhang

TL;DR
This paper introduces deconfounding techniques for deep reinforcement learning that leverage observational data, addressing biases caused by unobserved confounders to improve learning outcomes in real-world applications.
Contribution
The paper proposes novel deconfounding methods using causal inference to reweight or resample observational data in DRL, compatible with existing algorithms.
Findings
Deconfounding methods improve policy performance in experiments.
The methods effectively reduce bias from unobserved confounders.
Theoretical proofs support the methods' effectiveness.
Abstract
Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Causal Inference Techniques · Auction Theory and Applications
