Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam,, Remi Tachet des Combes, Romain Laroche

TL;DR
This paper investigates why bisimulation-based state representations underperform in offline RL, identifies key issues like missing transitions and reward scaling, and proposes solutions that improve performance on benchmark tasks.
Contribution
It analyzes the pitfalls of bisimulation in offline RL, introduces an expectile operator and reward scaling strategies, and demonstrates improved results on benchmark datasets.
Findings
Bisimulation methods struggle with missing transitions in offline data.
Reward scaling is crucial to prevent feature collapse in representations.
Applying expectile operator and reward scaling improves performance on benchmarks.
Abstract
While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Software Engineering Research · Adversarial Robustness in Machine Learning
