DRDT3: Diffusion-Refined Decision Test-Time Training Model
Xingshuai Huang, Di Wu, Benoit Boulet

TL;DR
DRDT3 introduces a novel framework combining decision transformers, RNN-based test-time training, and diffusion models to enhance trajectory modeling and policy optimization in offline reinforcement learning tasks.
Contribution
It proposes the Decision TTT (DT3) module and a unified diffusion-refined framework, achieving superior performance over existing decision transformer and offline RL methods.
Findings
DRDT3 outperforms standard Decision Transformer on multiple tasks.
DRDT3 achieves state-of-the-art results in the D4RL benchmark.
The diffusion refinement process improves policy quality progressively.
Abstract
Decision Transformer (DT), a trajectory modelling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal policies from suboptimal, reward-labelled trajectories. In this study, we explore the use of conditional generative modelling to facilitate trajectory stitching given its high-quality data generation ability. Additionally, recent advancements in Recurrent Neural Networks (RNNs) have shown their linear complexity and competitive sequence modelling performance over Transformers. We leverage the Test-Time Training (TTT) layer, an RNN that updates hidden states during testing, to model trajectories in the form of DT. We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
