Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

Dhruva Tirumala; Hyeonwoo Noh; Alexandre Galashov; Leonard; Hasenclever; Arun Ahuja; Greg Wayne; Razvan Pascanu; Yee Whye Teh; Nicolas; Heess

arXiv:1903.07438·cs.LG·January 24, 2020·20 cites

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard, Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas, Heess

PDF

Open Access

TL;DR

This paper explores hierarchical structures with latent variables in KL-regularized reinforcement learning to improve learning efficiency and transferability across diverse tasks.

Contribution

It introduces hierarchical models with latent variables in KL-regularized RL, enabling inductive biases and modular transfer benefits.

Findings

01

Faster learning observed in continuous control tasks

02

Enhanced transferability across tasks demonstrated

03

Hierarchical latent structures improve RL performance

Abstract

As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or prior behavior, which can be learned alongside the policy and as such partially transforms the reinforcement learning problem into one of behavior modelling. In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables. We discuss how the resulting hierarchical structures can be used to implement different inductive biases and how their modularity can benefit transfer. Empirically we find that they can lead to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms