PASTA: Pretrained Action-State Transformer Agents
Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet and, Guillaume Richard, Thomas Pierrot

TL;DR
This paper introduces PASTA, a unified pre-trained transformer framework for reinforcement learning that systematically compares design choices across diverse downstream tasks, emphasizing simplicity, robustness, and broad applicability.
Contribution
It presents a comprehensive investigation of pre-trained action-state transformer agents (PASTA), highlighting a unified methodology and practical design choices for robust RL models.
Findings
Tokenization at component level improves model performance.
Pre-training with next token prediction or masked language modeling is effective.
Models trained across multiple domains demonstrate versatility.
Abstract
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories. This advancement enables the models to tackle a broad spectrum of tasks, ranging from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified methodology and covers an extensive set of general downstream…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The authors conduct extensive experiments on 69 tasks in total and the results are solid. 2. There are some interesting conclusions, especially about the tokenization method, which is insightful for policy pretraining.
My concerns are mainly about the experiment setup and comparison fairness. Please refer to the Questions part for the details.
Motivation * The work is well motivated. It is important to have a broad understanding of a certain class of approaches and consolidate knowledge across various approaches and domains. Challenging Existing Results * I think there is general value in replicating existing results under separate implementations and checking for robustness to implementation changes. There is scientific value in challenging existing results and summarizing overall insights across methods in comparison studies. Stru
Algorithmic novelty * The main algorithmic contribution seems to be component-wise tokenization but there is no information on how that is done or ablations on it. It is not clear what tokenization works best in which environments and if there are particular patterns to how one should tokenize. It is also not clear if this tokenization might differ across simulator domains. The work at some points motivates the components as a type of logical decomposition (bi-pedal example) but this strain is n
Study is comprehensive and considers a variety of factors in pretraining including tokenization, pretraining objective, datasets, and downstream applications. - Interesting idea to break down the state representation into modular components (such as robot's morphology) depending on the environment which seems to improve performance over vanilla input representations. - Large set of downstream tasks evaluating both the learned representations and zero-shot transfer (23 total tasks). Tasks inc
Most of the results that the paper finds are seemingly intuitive (e.g. more diverse data is better for generalization, next-token prediction is a good objective...) The main improvement in performance in Figure 2 seems to come from the component level token representation although that is not highlighted as the main contribution of this work. Additionally, modular token inputs have been explored in other C-MTM objective seems to be comparable to GPT / BERT objectives in Figure 2, contrary to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection
