Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

TL;DR
This paper introduces reward-consistent dynamics models that improve generalization in offline reinforcement learning by leveraging a dynamics reward function, leading to significant performance gains across multiple benchmarks.
Contribution
The paper proposes reward-consistent dynamics models and the MOREC method, which enhances offline RL by filtering transitions based on a learned dynamics reward, improving generalization.
Findings
MOREC outperforms previous methods on D4RL and NeoRL benchmarks.
MOREC achieves above 95% online RL performance in multiple tasks.
The method demonstrates strong generalization ability, recovering unseen transitions.
Abstract
Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies
