A Unifying Framework for Action-Conditional Self-Predictive   Reinforcement Learning

Khimya Khetarpal; Zhaohan Daniel Guo; Bernardo Avila Pires; Yunhao; Tang; Clare Lyle; Mark Rowland; Nicolas Heess; Diana Borsa; Arthur Guez; Will; Dabney

arXiv:2406.02035·cs.LG·June 5, 2024

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao, Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will, Dabney

PDF

Open Access

TL;DR

This paper develops a unified theoretical framework for action-conditional self-predictive reinforcement learning, bridging the gap between existing theory and practical algorithms, and demonstrating improved empirical performance across various settings.

Contribution

It introduces a new action-conditional objective (BYOL-AC), analyzes its convergence, and unifies different objectives through model-based and model-free perspectives.

Findings

01

BYOL-AC outperforms existing methods in diverse RL environments.

02

Theoretical analysis reveals convergence properties and relationships between objectives.

03

Proposes a variance-like objective (BYOL-VAR) with favorable properties.

Abstract

Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation learning under the simplifying assumption that the algorithm depends on a fixed policy (BYOL- $Π$ ); this assumption is at odds with practical instantiations of such algorithms, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics