Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows
Minjae Cho, Jonathan P. How, and Chuangchuang Sun

TL;DR
This paper introduces MOOD-CRL, a causal inference-based offline reinforcement learning method using causal normalizing flows to improve out-of-distribution adaptation and policy performance beyond the dataset support.
Contribution
It develops Causal Normalizing Flows for data augmentation and counterfactual reasoning, enabling offline RL to better handle distributional shifts and out-of-distribution scenarios.
Findings
Outperforms existing offline RL methods significantly.
Demonstrates effective counterfactual reasoning capabilities.
Enhances OOD adaptation in offline policy training.
Abstract
Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance when the policy is evaluated on scenarios that are Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL resolves this issue by regularizing policy learning within the information supported by the given dataset. However, such regularization overlooks the potential for high-reward regions that may exist beyond the dataset. This motivates exploring novel offline learning techniques that can make improvements beyond the data support without compromising policy performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Data Stream Mining Techniques · Reinforcement Learning in Robotics
MethodsCausal inference
