Pretrained Visual Representations in Reinforcement Learning

Emlyn Williams; Athanasios Polydoros

arXiv:2407.17238·cs.RO·July 25, 2024

Pretrained Visual Representations in Reinforcement Learning

Emlyn Williams, Athanasios Polydoros

PDF

TL;DR

This paper compares training RL agents from scratch versus using pre-trained visual representations, revealing that PVRs can reduce training time and buffer size, with performance depending on the task.

Contribution

It provides a systematic comparison of pre-trained visual representations versus training from scratch in visual RL, highlighting trade-offs and the role of exploration.

Findings

01

PVRs reduce training time and replay buffer size.

02

Performance benefits of PVRs are task-dependent.

03

Dormant ratio correlates with model performance.

Abstract

Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. This paper compares the performance of RL algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs). We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the Metaworld Push-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that the choice of training from scratch compared to using PVRs for maximising performance is task-dependent, but PVRs offer advantages in terms of reduced replay buffer size and faster training times. We also identify a strong correlation between the dormant ratio and model performance, highlighting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.