Discovering Diverse Solutions in Deep Reinforcement Learning by   Maximizing State-Action-Based Mutual Information

Takayuki Osa; Voot Tangkaratt; Masashi Sugiyama

arXiv:2103.07084·stat.ML·April 14, 2022

Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information

Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel reinforcement learning method that directly maximizes the variational lower bound of mutual information to learn a diverse set of solutions, improving robustness and adaptation.

Contribution

It proposes a bias-free approach to learn diverse solutions by directly maximizing mutual information with latent variables, surpassing previous reward-based methods.

Findings

01

Successfully learns an infinite set of diverse solutions

02

Enables more effective few-shot adaptation

03

Demonstrates superior performance on robot locomotion tasks

Abstract

Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects