Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
Jiawei Xu, Rui Yang, Shuang Qiu, Feng Luo, Meng Fang, Baoxiang Wang,, Lei Han

TL;DR
This paper demonstrates that sequence modeling methods like Decision Transformer are inherently robust to data corruption in offline reinforcement learning, and introduces RDT, a robust variant with techniques to further improve performance under noisy data.
Contribution
The study reveals the robustness of vanilla sequence models against data corruption and proposes RDT, a new method incorporating robust techniques for improved offline RL performance.
Findings
RDT outperforms prior offline RL methods under data corruption.
Sequence modeling methods like Decision Transformer are inherently robust to noisy data.
RDT maintains high performance even with combined training and testing data perturbations.
Abstract
Learning policy from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making while avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods, particularly when the real-world data is limited. Our study reveals that prior research focusing on adapting predominant offline RL methods based on temporal difference learning still falls short under data corruption when the dataset is limited. In contrast, we discover that vanilla sequence modeling methods, such as Decision Transformer, exhibit robustness against data corruption, even without specialized modifications. To unlock the full potential of sequence modeling, we propose Robust Decision Rransformer (RDT) by incorporating three simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Network Security and Intrusion Detection · Imbalanced Data Classification Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam
