Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin, Chen

TL;DR
Meta-DT introduces a transformer-based offline meta-RL approach that uses world model disentanglement and self-guided prompts to achieve superior generalization without expert demonstrations.
Contribution
It proposes a novel offline meta-RL framework combining a context-aware world model with transformer sequence modeling and self-guided prompts for improved generalization.
Findings
Outperforms strong baselines on MuJoCo and Meta-World benchmarks.
Achieves superior few and zero-shot generalization.
Eliminates need for expert demonstrations at test time.
Abstract
A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Data Processing Techniques · Simulation Techniques and Applications · Scientific Computing and Data Management
MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer
