Pre-trained Visual Dynamics Representations for Efficient Policy   Learning

Hao Luo; Bohan Zhou; and Zongqing Lu

arXiv:2411.03169·cs.CV·November 6, 2024

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

Hao Luo, Bohan Zhou, and Zongqing Lu

PDF

Open Access

TL;DR

This paper introduces PVDR, a pre-training method using video prediction with a Transformer-based CVAE to learn visual dynamics representations, improving policy learning in robotics tasks by bridging the domain gap between videos and downstream RL applications.

Contribution

The paper proposes a novel pre-training approach using video prediction and a Transformer-based CVAE to learn visual dynamics representations for reinforcement learning.

Findings

01

PVDR improves policy learning efficiency in robotics tasks.

02

Pre-trained visual dynamics representations transfer well to downstream tasks.

03

The method effectively bridges the domain gap between in-the-wild videos and RL environments.

Abstract

Pre-training for Reinforcement Learning (RL) with purely video data is a valuable yet challenging problem. Although in-the-wild videos are readily available and inhere a vast amount of prior world knowledge, the absence of action annotations and the common domain gap with downstream tasks hinder utilizing videos for RL pre-training. To address the challenge of pre-training with videos, we propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning. By adopting video prediction as a pre-training task, we use a Transformer-based Conditional Variational Autoencoder (CVAE) to learn visual dynamics representations. The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics