Offline Reinforcement Learning with Reverse Model-based Imagination

Jianhao Wang; Wenzhe Li; Haozhe Jiang; Guangxiang Zhu; Siyuan Li,; Chongjie Zhang

arXiv:2110.00188·cs.LG·November 16, 2021·21 cites

Offline Reinforcement Learning with Reverse Model-based Imagination

Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li,, Chongjie Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces ROMI, a novel offline RL framework using reverse dynamics models to generate goal-directed imaginations, improving conservative behavior and performance on benchmarks.

Contribution

ROMI employs a reverse dynamics model and reverse policy to generate targeted imaginations, enhancing conservative generalization in offline RL.

Findings

01

ROMI achieves state-of-the-art results on offline RL benchmarks.

02

ROMI generates more conservative behaviors than existing methods.

03

ROMI effectively combines with model-free algorithms for improved performance.

Abstract

In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset. To address this problem, recent offline RL methods attempt to introduce conservatism bias to encourage learning in high-confidence areas. Model-free approaches directly encode such bias into policy or value function learning using conservative regularizations or special network structures, but their constrained policy search limits the generalization beyond the offline dataset. Model-based approaches learn forward dynamics models with conservatism quantifications and then generate imaginary trajectories to extend the offline datasets. However, due to limited samples in offline datasets, conservatism quantifications often suffer from overgeneralization in out-of-support regions. The unreliable conservative measures will…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenzhe-li/romi
pytorch

Videos

Offline Reinforcement Learning with Reverse Model-based Imagination· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control