Hierarchical Reinforcement Learning via Advantage-Weighted Information   Maximization

Takayuki Osa; Voot Tangkaratt; Masashi Sugiyama

arXiv:1901.01365·cs.LG·March 8, 2019·20 cites

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hierarchical reinforcement learning method that learns latent policy structures via mutual information maximization and advantage-weighted importance sampling, improving performance in continuous control tasks.

Contribution

It proposes a novel HRL framework that learns discrete latent representations of policies and options using mutual information maximization and advantage-weighted sampling.

Findings

01

Learned diverse options effectively.

02

Enhanced RL performance in continuous control tasks.

03

Demonstrated the approach's ability to identify meaningful hierarchical structures.

Abstract

Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TakaOsa/adInfoHRL
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Electric Vehicles and Infrastructure