Is Mamba Compatible with Trajectory Optimization in Offline   Reinforcement Learning?

Yang Dai; Oubo Ma; Longfei Zhang; Xingxing Liang; Shengchao Hu,; Mengzhu Wang; Shouling Ji; Jincai Huang; Li Shen

arXiv:2405.12094·cs.LG·October 29, 2024

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu,, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen

PDF

Open Access 1 Repo

TL;DR

This paper investigates the compatibility of Mamba, a linear-time sequence model, with trajectory optimization in offline reinforcement learning, demonstrating that a specially designed Decision Mamba outperforms existing methods with fewer parameters.

Contribution

The work introduces a Transformer-like Decision Mamba tailored for offline RL, highlighting the importance of the hidden attention mechanism and demonstrating superior performance with fewer parameters.

Findings

01

Decision Mamba outperforms Decision Transformer in Atari and MuJoCo benchmarks.

02

Long sequences are less beneficial, leading to the adoption of a Transformer-like DeMa.

03

Hidden attention mechanism is critical and effective without position embedding.

Abstract

Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL). Yet, it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba (dubbed DeMa) in offline RL from the aspect of data structures and essential components with the following insights: (1) Long sequences impose a significant computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AndssY/DeMa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Locomotion and Control

MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout