Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling
Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang, Chen, Lichao Sun, Bo Yang

TL;DR
Decision Mamba and its hybrid variant combine transformer and Mamba models to improve long-term dependency handling in reinforcement learning, achieving state-of-the-art results with significantly enhanced computational efficiency.
Contribution
The paper introduces Decision Mamba and Decision Mamba-Hybrid models that integrate transformer and Mamba architectures for efficient long-term reinforcement learning.
Findings
DM-H achieves state-of-the-art performance on multiple benchmarks.
DM-H is 28 times faster than transformer-based baselines in long-term tasks.
Hybrid model effectively balances prediction quality and computational efficiency.
Abstract
Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
