Bridging Offline Reinforcement Learning and Imitation Learning: A Tale   of Pessimism

Paria Rashidinejad; Banghua Zhu; Cong Ma; Jiantao Jiao; Stuart Russell

arXiv:2103.12021·cs.LG·July 4, 2023

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

PDF

Open Access 1 Video

TL;DR

This paper introduces a unified offline RL framework that interpolates between imitation learning and vanilla offline RL, and proposes an adaptive algorithm with optimal convergence rates across different data compositions.

Contribution

It presents a new framework based on a weak concentrability coefficient and develops an LCB algorithm that adapts to unknown data compositions with minimax optimal rates.

Findings

01

LCB achieves a $1/N$ rate for nearly-expert datasets.

02

LCB is adaptively optimal across the entire data composition range in contextual bandits.

03

LCB is nearly adaptively optimal in MDPs.

Abstract

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets often deviate from these two extremes and the exact data composition is usually unknown a priori. To bridge this gap, we present a new offline RL framework that smoothly interpolates between the two extremes of data composition, hence unifying imitation learning and vanilla offline RL. The new framework is centered around a weak version of the concentrability coefficient that measures the deviation from the behavior policy to the expert policy alone. Under this new framework, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Data Classification