Model-Based Offline Reinforcement Learning with Pessimism-Modulated   Dynamics Belief

Kaiyang Guo; Yunfeng Shao; Yanhui Geng

arXiv:2210.06692·cs.LG·November 1, 2022·6 cites

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Kaiyang Guo, Yunfeng Shao, Yanhui Geng

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces a novel offline RL method that maintains a belief distribution over dynamics, using pessimism-modulated sampling to improve policy learning and achieve state-of-the-art results.

Contribution

It proposes a new approach of biased sampling from a dynamics belief to better handle uncertainty, with theoretical and practical algorithms for offline RL.

Findings

01

Achieves state-of-the-art performance on benchmark tasks.

02

Demonstrates effective handling of dynamics uncertainty.

03

Provides theoretical guarantees for policy improvement.

Abstract

Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model. While the dynamics model learned through reuse of the static dataset, its generalization ability hopefully promotes policy learning if properly utilized. To that end, several works propose to quantify the uncertainty of predicted dynamics, and explicitly apply it to penalize reward. However, as the dynamics and the reward are intrinsically different factors in context of MDP, characterizing the impact of dynamics uncertainty through reward penalty may incur unexpected tradeoff between model utilization and risk avoidance. In this work, we instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief. The sampling procedure, biased towards pessimism, is derived based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics