Pretrained Visual Representations in Reinforcement Learning
Emlyn Williams, Athanasios Polydoros

TL;DR
This paper compares training RL agents from scratch versus using pre-trained visual representations, revealing that PVRs can reduce training time and buffer size, with performance depending on the task.
Contribution
It provides a systematic comparison of pre-trained visual representations versus training from scratch in visual RL, highlighting trade-offs and the role of exploration.
Findings
PVRs reduce training time and replay buffer size.
Performance benefits of PVRs are task-dependent.
Dormant ratio correlates with model performance.
Abstract
Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. This paper compares the performance of RL algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs). We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC). We use the Metaworld Push-v2 and Drawer-Open-v2 tasks for our comparison. Our results show that the choice of training from scratch compared to using PVRs for maximising performance is task-dependent, but PVRs offer advantages in terms of reduced replay buffer size and faster training times. We also identify a strong correlation between the dormant ratio and model performance, highlighting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
