Towards Principled Representation Learning from Videos for Reinforcement Learning
Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

TL;DR
This paper investigates theoretical foundations for learning latent state representations from video data for reinforcement learning, analyzing different noise settings and evaluating methods like autoencoding and contrastive learning.
Contribution
It provides the first theoretical analysis of representation learning from videos in RL, including bounds and challenges with exogenous noise.
Findings
Temporal contrastive learning and forward modeling can learn latent states under iid noise.
Exogenous, non-iid noise can exponentially increase sample complexity.
Experimental results align with theoretical predictions.
Abstract
We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent. We initiate the theoretical investigation into principled approaches for representation learning and focus on learning the latent state representations of the underlying MDP using video data. We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background. We study three commonly used approaches: autoencoding, temporal contrastive learning, and forward modeling. We prove upper bounds for temporal contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications
MethodsFocus · Contrastive Learning
