Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Zecheng Wang, Che Wang, Zixuan Dong, Keith Ross

TL;DR
Pre-training with simple synthetic data, including data from Markov chains, can enhance offline reinforcement learning performance, challenging the necessity of language-based pre-training.
Contribution
The paper demonstrates that synthetic data pre-training, even without language, can match or surpass language-based gains and improve offline DRL algorithms like CQL.
Findings
Synthetic IID data pre-training matches language pre-training performance.
Markov chain generated data further improves results.
Pre-training enhances CQL performance on D4RL datasets.
Abstract
Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is…
Peer Reviews
Decision·ICLR 2024 poster
1. Clarity of Presentation: The paper is well-structured, making it accessible even to those who may not be deeply versed in the domain. The significance of the research question is conveyed effectively, which facilitates a quick grasp of the paper's importance. 2. Innovation in Data Construction: The methodology employed for the generation of synthetic data is both novel and straightforward, potentially offering a simpler alternative to more complex data generation strategies.
1. Methodological Justification: The rationale behind the adoption of a Markov Chain for synthetic data generation requires further elaboration. While the introduction suggests that understanding the underlying question is crucial for enhancing pre-training in deep reinforcement learning (DRL), the link between this understanding and the proposed method is not convincingly established. 2. Need More Deep Analysis: The paper primarily demonstrates the efficacy of the proposed method without a rob
1. This paper proposes a simple yet effective pre-training method with synthetic data for Decision Transformer. 2. Results demonstrate that the proposed pre-training method with CQL can achieve significant improvements.
1. The experiments lack comparison for some pre-trained DT models, such as Future-conditioned Unsupervised Pretraining for Decision Transformer(https://proceedings.mlr.press/v202/xie23b/xie23b.pdf). 2. It is more convincing to evaluate the proposed methods on more tasks.
* Presents empirical evidence that contradicts the prevailing belief that language data is essential for pre-training models for offline Deep Reinforcement Learning. * Demonstrates the benefits of synthetic pre-training data for both transformer and Q-learning based approaches to offline DRL. * Reports ablation studies to investigate the influence of various parameters such as the size of the state space, temperature, and order of the Markov Chain.
* The paper does not investigate the impact of the number of updates during fine-tuning, which is kept constant at 100k for DT and 1M for CQL. It would be useful to understand the relationship between the parameters of the synthetic data and the number of updates in fine-tuning. This has the practical implication that fine-tuning data is typically task-specific and its availability may be severely limited. Alternatively, computational constraints may limit the fine-tuning budget. * The paper do
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam · Residual Connection · Q-Learning · Layer Normalization
