Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence   Modeling

Sili Huang; Jifeng Hu; Zhejian Yang; Liwei Yang; Tao Luo; Hechang; Chen; Lichao Sun; Bo Yang

arXiv:2406.00079·cs.LG·June 4, 2024·2 cites

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang, Chen, Lichao Sun, Bo Yang

PDF

Open Access 1 Video

TL;DR

Decision Mamba and its hybrid variant combine transformer and Mamba models to improve long-term dependency handling in reinforcement learning, achieving state-of-the-art results with significantly enhanced computational efficiency.

Contribution

The paper introduces Decision Mamba and Decision Mamba-Hybrid models that integrate transformer and Mamba architectures for efficient long-term reinforcement learning.

Findings

01

DM-H achieves state-of-the-art performance on multiple benchmarks.

02

DM-H is 28 times faster than transformer-based baselines in long-term tasks.

03

Hybrid model effectively balances prediction quality and computational efficiency.

Abstract

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer