Causal Flow Q-Learning for Robust Offline Reinforcement Learning
Mingxuan Li, Junzhe Zhang, Elias Bareinboim

TL;DR
This paper introduces a causal flow Q-learning method for offline reinforcement learning that effectively handles confounded observational data, especially from pixel-based demonstrations, by optimizing worst-case performance and employing a deep discriminator.
Contribution
It proposes a novel causal offline RL objective and a practical flow-matching policy learning approach robust to confounding biases in offline data.
Findings
Achieves 120% higher success rate on 25 pixel-based tasks.
Effectively mitigates confounding biases in offline RL.
Demonstrates robustness in complex, real-world scenarios.
Abstract
Expressive policies based on flow-matching have been successfully applied in reinforcement learning (RL) more recently due to their ability to model complex action distributions from offline data. These algorithms build on standard policy gradients, which assume that there is no unmeasured confounding in the data. However, this condition does not necessarily hold for pixel-based demonstrations when a mismatch exists between the demonstrator's and the learner's sensory capabilities, leading to implicit confounding biases in offline data. We address the challenge by investigating the problem of confounded observations in offline RL from a causal perspective. We develop a novel causal offline RL objective that optimizes policies' worst-case performance that may arise due to confounding biases. Based on this new objective, we introduce a practical implementation that learns expressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
