Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu,, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen

TL;DR
This paper investigates the compatibility of Mamba, a linear-time sequence model, with trajectory optimization in offline reinforcement learning, demonstrating that a specially designed Decision Mamba outperforms existing methods with fewer parameters.
Contribution
The work introduces a Transformer-like Decision Mamba tailored for offline RL, highlighting the importance of the hidden attention mechanism and demonstrating superior performance with fewer parameters.
Findings
Decision Mamba outperforms Decision Transformer in Atari and MuJoCo benchmarks.
Long sequences are less beneficial, leading to the adoption of a Transformer-like DeMa.
Hidden attention mechanism is critical and effective without position embedding.
Abstract
Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL). Yet, it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba (dubbed DeMa) in offline RL from the aspect of data structures and essential components with the following insights: (1) Long sequences impose a significant computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control
MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout
