Can Wikipedia Help Offline Reinforcement Learning?

Machel Reid; Yutaro Yamada; Shixiang Shane Gu

arXiv:2201.12122·cs.LG·July 26, 2022·6 cites

Can Wikipedia Help Offline Reinforcement Learning?

Machel Reid, Yutaro Yamada, Shixiang Shane Gu

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper explores using pre-trained sequence models from domains like language and vision to improve offline reinforcement learning, achieving faster convergence and state-of-the-art results across various tasks.

Contribution

It demonstrates the effectiveness of transfer learning with pre-trained sequence models for offline RL, introducing techniques to enhance cross-domain transfer and significantly speeding up training.

Findings

01

Accelerated training by 3-6x across environments

02

Achieved state-of-the-art performance on multiple tasks

03

Leveraged Wikipedia-pretrained and GPT-2 models for RL

Abstract

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

machelreid/can-wikipedia-help-offline-rl
pytorchOfficial

Videos

Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)· youtube

Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Residual Connection · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Dropout