Towards Principled Representation Learning from Videos for Reinforcement   Learning

Dipendra Misra; Akanksha Saran; Tengyang Xie; Alex Lamb; John Langford

arXiv:2403.13765·cs.LG·March 21, 2024·1 cites

Towards Principled Representation Learning from Videos for Reinforcement Learning

Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates theoretical foundations for learning latent state representations from video data for reinforcement learning, analyzing different noise settings and evaluating methods like autoencoding and contrastive learning.

Contribution

It provides the first theoretical analysis of representation learning from videos in RL, including bounds and challenges with exogenous noise.

Findings

01

Temporal contrastive learning and forward modeling can learn latent states under iid noise.

02

Exogenous, non-iid noise can exponentially increase sample complexity.

03

Experimental results align with theoretical predictions.

Abstract

We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent. We initiate the theoretical investigation into principled approaches for representation learning and focus on learning the latent state representations of the underlying MDP using video data. We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background. We study three commonly used approaches: autoencoding, temporal contrastive learning, and forward modeling. We prove upper bounds for temporal contrastive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/intrepid
pytorchOfficial

Videos

Towards Principled Representation Learning from Videos for Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsFocus · Contrastive Learning