Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

TL;DR
This paper introduces a hierarchical reinforcement learning method that learns latent policy structures via mutual information maximization and advantage-weighted importance sampling, improving performance in continuous control tasks.
Contribution
It proposes a novel HRL framework that learns discrete latent representations of policies and options using mutual information maximization and advantage-weighted sampling.
Findings
Learned diverse options effectively.
Enhanced RL performance in continuous control tasks.
Demonstrated the approach's ability to identify meaningful hierarchical structures.
Abstract
Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Electric Vehicles and Infrastructure
