PASTA: Pretrained Action-State Transformer Agents

Raphael Boige; Yannis Flet-Berliac; Arthur Flajolet and; Guillaume Richard; Thomas Pierrot

arXiv:2307.10936·cs.AI·December 5, 2023

PASTA: Pretrained Action-State Transformer Agents

Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet and, Guillaume Richard, Thomas Pierrot

PDF

Open Access 3 Reviews

TL;DR

This paper introduces PASTA, a unified pre-trained transformer framework for reinforcement learning that systematically compares design choices across diverse downstream tasks, emphasizing simplicity, robustness, and broad applicability.

Contribution

It presents a comprehensive investigation of pre-trained action-state transformer agents (PASTA), highlighting a unified methodology and practical design choices for robust RL models.

Findings

01

Tokenization at component level improves model performance.

02

Pre-training with next token prediction or masked language modeling is effective.

03

Models trained across multiple domains demonstrate versatility.

Abstract

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories. This advancement enables the models to tackle a broad spectrum of tasks, ranging from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified methodology and covers an extensive set of general downstream…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The authors conduct extensive experiments on 69 tasks in total and the results are solid. 2. There are some interesting conclusions, especially about the tokenization method, which is insightful for policy pretraining.

Weaknesses

My concerns are mainly about the experiment setup and comparison fairness. Please refer to the Questions part for the details.

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

Motivation * The work is well motivated. It is important to have a broad understanding of a certain class of approaches and consolidate knowledge across various approaches and domains. Challenging Existing Results * I think there is general value in replicating existing results under separate implementations and checking for robustness to implementation changes. There is scientific value in challenging existing results and summarizing overall insights across methods in comparison studies. Stru

Weaknesses

Algorithmic novelty * The main algorithmic contribution seems to be component-wise tokenization but there is no information on how that is done or ablations on it. It is not clear what tokenization works best in which environments and if there are particular patterns to how one should tokenize. It is also not clear if this tokenization might differ across simulator domains. The work at some points motivates the components as a type of logical decomposition (bi-pedal example) but this strain is n

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

Study is comprehensive and considers a variety of factors in pretraining including tokenization, pretraining objective, datasets, and downstream applications. - Interesting idea to break down the state representation into modular components (such as robot's morphology) depending on the environment which seems to improve performance over vanilla input representations. - Large set of downstream tasks evaluating both the learned representations and zero-shot transfer (23 total tasks). Tasks inc

Weaknesses

Most of the results that the paper finds are seemingly intuitive (e.g. more diverse data is better for generalization, next-token prediction is a good objective...) The main improvement in performance in Figure 2 seems to come from the component level token representation although that is not highlighted as the main contribution of this work. Additionally, modular token inputs have been explored in other C-MTM objective seems to be comparable to GPT / BERT objectives in Figure 2, contrary to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection