Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Dmitriy Akimov, Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov,, Sergey Kolesnikov

TL;DR
This paper introduces a novel offline reinforcement learning method that uses a Normalizing Flows-based latent action space to improve policy conservatism and performance without extra regularization.
Contribution
It proposes a pre-trained Normalizing Flows generative model as a conservative action encoder in the latent space for offline RL, avoiding out-of-dataset actions and enhancing performance.
Findings
Outperforms recent algorithms on locomotion tasks
Effective in handling distributional shift
Reduces extrapolation error
Abstract
Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
MethodsNormalizing Flows
