Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
Shakir Mohamed, Danilo Jimenez Rezende

TL;DR
This paper introduces a scalable variational inference method combining deep learning for optimizing mutual information, enabling empowerment-based reinforcement learning directly from visual inputs.
Contribution
It proposes a novel variational approach for mutual information maximization that is scalable and applicable to high-dimensional visual data in reinforcement learning.
Findings
Enables scalable mutual information optimization from pixels to actions.
Integrates variational inference with deep convolutional networks.
Facilitates empowerment-based exploration in reinforcement learning.
Abstract
The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning. We develop our approach by focusing on the problem of intrinsically-motivated learning, where the mutual information forms the definition of a well-known internal drive known as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Machine Learning and Data Classification
