Understanding Self-Predictive Learning for Reinforcement Learning

Yunhao Tang; Zhaohan Daniel Guo; Pierre Harvey Richemond; Bernardo; \'Avila Pires; Yash Chandak; R\'emi Munos; Mark Rowland; Mohammad Gheshlaghi; Azar; Charline Le Lan; Clare Lyle; Andr\'as Gy\"orgy; Shantanu Thakoor; Will; Dabney; Bilal Piot; Daniele Calandriello; Michal Valko

arXiv:2212.03319·cs.LG·December 8, 2022·1 cites

Understanding Self-Predictive Learning for Reinforcement Learning

Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo, \'Avila Pires, Yash Chandak, R\'emi Munos, Mark Rowland, Mohammad Gheshlaghi, Azar, Charline Le Lan, Clare Lyle, Andr\'as Gy\"orgy, Shantanu Thakoor, Will, Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper investigates the learning dynamics of self-predictive algorithms in reinforcement learning, highlighting the importance of optimization strategies to avoid trivial solutions and proposing a novel bidirectional method with theoretical and empirical validation.

Contribution

It provides theoretical analysis of self-predictive learning dynamics, identifying key optimization strategies, and introduces a new bidirectional algorithm with demonstrated effectiveness.

Findings

01

Optimization dynamics prevent trivial solutions.

02

Self-predictive learning captures transition dynamics.

03

Bidirectional learning improves representation quality.

Abstract

We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

Understanding Self-Predictive Learning for Reinforcement Learning· slideslive

Taxonomy

TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Reinforcement Learning in Robotics