Can Wikipedia Help Offline Reinforcement Learning?
Machel Reid, Yutaro Yamada, Shixiang Shane Gu

TL;DR
This paper explores using pre-trained sequence models from domains like language and vision to improve offline reinforcement learning, achieving faster convergence and state-of-the-art results across various tasks.
Contribution
It demonstrates the effectiveness of transfer learning with pre-trained sequence models for offline RL, introducing techniques to enhance cross-domain transfer and significantly speeding up training.
Findings
Accelerated training by 3-6x across environments
Achieved state-of-the-art performance on multiple tasks
Leveraged Wikipedia-pretrained and GPT-2 models for RL
Abstract
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)· youtube
Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)· youtube
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Residual Connection · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Dropout
