Conservative Bayesian Model-Based Value Expansion for Offline Policy   Optimization

Jihwan Jeong; Xiaoyu Wang; Michael Gimelfarb; Hyunwoo Kim; Baher; Abdulhai; Scott Sanner

arXiv:2210.03802·cs.LG·March 6, 2023

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher, Abdulhai, Scott Sanner

PDF

Open Access 1 Repo 1 Video

TL;DR

CBOP introduces a conservative Bayesian approach for offline RL that adaptively combines model-based and model-free estimates based on uncertainty, significantly improving performance over prior methods.

Contribution

It proposes a novel conservative Bayesian value expansion method that effectively balances model reliance and uncertainty in offline policy optimization.

Findings

01

Outperforms previous model-based methods like MOPO, MOReL, and COMBO significantly.

02

Achieves state-of-the-art results on 11 out of 18 D4RL benchmarks.

03

Demonstrates robust performance across diverse offline RL datasets.

Abstract

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jihwan-jeong/cbop
pytorchOfficial

Videos

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms