Bridging State and History Representations: Understanding Self-Predictive RL
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement, Gehring, Aditya Mahajan, Pierre-Luc Bacon

TL;DR
This paper unifies various representation learning methods in reinforcement learning under the concept of self-predictive abstraction, providing theoretical insights and a minimalist algorithm applicable to different RL environments.
Contribution
It reveals the shared foundation of state and history representations in RL, introduces a theoretical framework, and proposes a simple algorithm for learning self-predictive representations.
Findings
Unified view of state and history representations as self-predictive abstractions
Theoretical analysis of objectives and optimization techniques like stop-gradient
Validated algorithm across MDPs, distractor environments, and POMDPs
Abstract
Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training
