Pretraining the Vision Transformer using self-supervised methods for   vision based Deep Reinforcement Learning

Manuel Goul\~ao; Arlindo L. Oliveira

arXiv:2209.10901·cs.LG·July 20, 2023

Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning

Manuel Goul\~ao, Arlindo L. Oliveira

PDF

Open Access 1 Repo

TL;DR

This paper investigates pretraining Vision Transformers with self-supervised methods for reinforcement learning, emphasizing the importance of temporal relations and demonstrating improved data efficiency and richer representations in Atari environments.

Contribution

It introduces a temporal order verification task to enhance self-supervised pretraining of Vision Transformers for RL, leading to better representations and performance.

Findings

01

Self-supervised pretraining improves RL data efficiency.

02

Temporal order verification enhances representation quality.

03

Pretrained encoder yields richer, more focused attention maps.

Abstract

The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, convolutional neural networks (CNN) remain the preferential architecture for the representation module in reinforcement learning. In this work, we study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess the quality of the learned representations. To show the importance of the temporal dimension in this context we propose an extension of VICReg to better capture temporal relations between observations by adding a temporal order verification task. Our results show that all methods are effective in learning useful representations and avoiding representational collapse for observations from Atari Learning Environment (ALE) which leads to improvements in data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mgoulao/tov-vicreg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Memory and Neural Computing

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Label Smoothing