Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World   Model Disentanglement

Zhi Wang; Li Zhang; Wenhao Wu; Yuanheng Zhu; Dongbin Zhao; Chunlin; Chen

arXiv:2410.11448·cs.LG·October 25, 2024·2 cites

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin, Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

Meta-DT introduces a transformer-based offline meta-RL approach that uses world model disentanglement and self-guided prompts to achieve superior generalization without expert demonstrations.

Contribution

It proposes a novel offline meta-RL framework combining a context-aware world model with transformer sequence modeling and self-guided prompts for improved generalization.

Findings

01

Outperforms strong baselines on MuJoCo and Meta-World benchmarks.

02

Achieves superior few and zero-shot generalization.

03

Eliminates need for expert demonstrations at test time.

Abstract

A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-rl/meta-dt
noneOfficial

Videos

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement· slideslive

Taxonomy

TopicsAdvanced Data Processing Techniques · Simulation Techniques and Applications · Scientific Computing and Data Management

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer