Let Offline RL Flow: Training Conservative Agents in the Latent Space of   Normalizing Flows

Dmitriy Akimov; Vladislav Kurenkov; Alexander Nikulin; Denis Tarasov,; Sergey Kolesnikov

arXiv:2211.11096·cs.LG·January 31, 2023

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Dmitriy Akimov, Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov,, Sergey Kolesnikov

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel offline reinforcement learning method that uses a Normalizing Flows-based latent action space to improve policy conservatism and performance without extra regularization.

Contribution

It proposes a pre-trained Normalizing Flows generative model as a conservative action encoder in the latent space for offline RL, avoiding out-of-dataset actions and enhancing performance.

Findings

01

Outperforms recent algorithms on locomotion tasks

02

Effective in handling distributional shift

03

Reduces extrapolation error

Abstract

Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)

MethodsNormalizing Flows