Deployment-Efficient Reinforcement Learning via Model-Based Offline   Optimization

Tatsuya Matsushima; Hiroki Furuta; Yutaka Matsuo; Ofir Nachum,; Shixiang Gu

arXiv:2006.03647·cs.LG·June 24, 2020·50 cites

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum,, Shixiang Gu

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces BREMEN, a model-based offline RL algorithm that significantly reduces the number of environment deployments needed for policy learning, making RL more practical for real-world applications with deployment constraints.

Contribution

The paper proposes BREMEN, a novel model-based offline RL method that achieves high deployment efficiency by requiring fewer environment interactions and deployments than existing algorithms.

Findings

01

BREMEN learns successful policies with only 5-10 deployments.

02

It uses 10-20 times less data than prior offline RL methods.

03

Recursive application of BREMEN maintains or improves sample efficiency.

Abstract

Most reinforcement learning (RL) algorithms assume online access to the environment, in which one may readily interleave updates to the policy with experience collection using that policy. However, in many real-world applications such as health, education, dialogue agents, and robotics, the cost or potential risk of deploying a new data-collection policy is high, to the point that it can become prohibitive to update the data-collection policy more than a few times during learning. With this view, we propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning. We observe that na\"{i}vely applying existing model-free offline RL algorithms recursively does not lead to a practical deployment-efficient and sample-efficient algorithm. We propose a novel model-based algorithm, Behavior-Regularized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Modular Robots and Swarm Intelligence · Mobile Crowdsensing and Crowdsourcing