Variational Intrinsic Control Revisited

Taehwan Kwon

arXiv:2010.03281·cs.LG·March 18, 2021

Variational Intrinsic Control Revisited

Taehwan Kwon

PDF

Open Access 1 Video

TL;DR

This paper revisits variational intrinsic control (VIC), identifies bias issues in stochastic environments, and proposes two correction methods to improve the optimality of intrinsic rewards in reinforcement learning.

Contribution

It introduces two new methods based on probabilistic models to correct bias in VIC's intrinsic reward, enhancing its effectiveness in stochastic settings.

Findings

01

Bias in VIC's intrinsic reward causes suboptimal convergence

02

Proposed methods correct bias and improve optimality

03

Experimental results validate the effectiveness of the new methods

Abstract

In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions. To correct this behavior and achieve the maximal empowerment, we propose two methods respectively based on the transitional probability model and Gaussian mixture model. We substantiate our claims through rigorous mathematical derivations and experimental analyses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variational Intrinsic Control Revisited· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control