Offline Reinforcement Learning with Reverse Model-based Imagination
Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li,, Chongjie Zhang

TL;DR
The paper introduces ROMI, a novel offline RL framework using reverse dynamics models to generate goal-directed imaginations, improving conservative behavior and performance on benchmarks.
Contribution
ROMI employs a reverse dynamics model and reverse policy to generate targeted imaginations, enhancing conservative generalization in offline RL.
Findings
ROMI achieves state-of-the-art results on offline RL benchmarks.
ROMI generates more conservative behaviors than existing methods.
ROMI effectively combines with model-free algorithms for improved performance.
Abstract
In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset. To address this problem, recent offline RL methods attempt to introduce conservatism bias to encourage learning in high-confidence areas. Model-free approaches directly encode such bias into policy or value function learning using conservative regularizations or special network structures, but their constrained policy search limits the generalization beyond the offline dataset. Model-based approaches learn forward dynamics models with conservatism quantifications and then generate imaginary trajectories to extend the offline datasets. However, due to limited samples in offline datasets, conservatism quantifications often suffer from overgeneralization in out-of-support regions. The unreliable conservative measures will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control
