Information asymmetry in KL-regularized RL

Alexandre Galashov; Siddhant M. Jayakumar; Leonard Hasenclever; Dhruva; Tirumala; Jonathan Schwarz; Guillaume Desjardins; Wojciech M. Czarnecki; Yee; Whye Teh; Razvan Pascanu; Nicolas Heess

arXiv:1905.01240·cs.LG·May 6, 2019·24 cites

Information asymmetry in KL-regularized RL

Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva, Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee, Whye Teh, Razvan Pascanu, Nicolas Heess

PDF

Open Access 1 Repo

TL;DR

This paper explores a novel approach in reinforcement learning where a learned default policy with limited information capacity accelerates and enhances learning by leveraging repeated structures in the environment.

Contribution

It introduces a method to learn a default policy constrained by information limits within KL-regularized RL, connecting it to information bottleneck and variational EM frameworks.

Findings

01

Learning a default policy speeds up training.

02

Restricting information flow improves policy reuse.

03

Empirical results show faster convergence in various domains.

Abstract

Many real world tasks exhibit rich structure that is repeated across different parts of the state space or in time. In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning. We start from the KL regularized expected reward objective which introduces an additional component, a default policy. Instead of relying on a fixed default policy, we learn it from data. But crucially, we restrict the amount of information the default policy receives, forcing it to learn reusable behaviors that help the policy learn faster. We formalize this strategy and discuss connections to information bottleneck approaches and to the variational EM algorithm. We present empirical results in both discrete and continuous action domains and demonstrate that, for certain tasks, learning a default policy alongside the policy can significantly speed up and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RobvanGastel/svg-priors
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Reinforcement Learning in Robotics · Machine Learning and Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings