Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Wenlong Wang; Ivana Dusparic; Yucheng Shi; Ke Zhang; Vinny Cahill

arXiv:2410.08893·cs.LG·May 19, 2025

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

Drama introduces a Mamba-enabled state space model for reinforcement learning that is more efficient and scalable, achieving competitive performance on Atari benchmarks with less computational resources.

Contribution

The paper presents Drama, a novel SSM-based world model leveraging Mamba, with linear complexity and improved training efficiency for model-based RL.

Findings

01

Achieves competitive Atari100k scores with a 7M parameter model.

02

Uses linear O(n) complexity for memory and computation.

03

Accessible training on standard hardware.

Abstract

Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been explored, each with its own challenges. Currently, recurrent neural network (RNN)-based world models struggle with vanishing gradients and capturing long-term dependencies. Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O (n^{2})$ , where $n$ is the sequence length. To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O (n)$ …

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- **Originality:** The paper introduces the novel application of Mamba-2 SSMs within MB-RL, specifically as the dynamics model in the world model. This is a new approach that addresses the limitations of existing architectures like typical RNNs and transformers and it makes a lot of sense in my opinion. - **Quality:** The authors provide thorough experimental evaluations on the Atari100k benchmark, demonstrating that DRAMA achieves competitive performance with significantly fewer parameters (at

Weaknesses

- **Unsupported claims about capturing long-term dependencies:** While the authors claim repeatedly that Mamba-2 effectively handling long-term dependencies, the paper provides limited direct evidence or analysis to demonstrate this capability. Including experiments or analyses that specifically test and showcase the ability to capture long-term dependencies would strengthen the paper. For instance, a task designed to require long-term memory or metrics that quantify the model's ability to captu

Reviewer 02Rating 6Confidence 4

Strengths

- The proposed idea of using SSM for WMs is interesting as they provide crucial benefits over training with Transformers and RNNs. - The proposed method achieves good performance with significantly fewer parameters (7M) when compared with baselines.

Weaknesses

- It is hard to articulate where the performance gains are coming from. Section 3.2.1 discusses that DFS provides an advantage over uniform sampling. Since DFS is agnostic to most baselines, it is important to see a comparison of either Drama with Uniform Sampling or baselines with DFS sampling to understand if the architecture is helping or the sampling. - The paper is not well written and it is hard to understand the details and motivation behind the design choices. Questions 1-5 below expand

Reviewer 03Rating 8Confidence 3

Strengths

- [S1] The paper combines established methods in a novel way, effectively addressing an existing gap in world model research. - [S2] The proposed model is computationally efficient, requiring only 7 million trainable parameters, making it accessible.

Weaknesses

- [W1] The extent to which the Mamba architecture contributes to the model's performance remains unclear. Specifically, it is unclear how DFS impacts scores across all games. Extending ablation study 3.2.1 to cover more games, or conducting a new study that replaces Mamba with an RNN or transformer, would clarify these contributions. - [W2] While the paper emphasizes Mamba's computational efficiency, there is a lack of exact wall-clock training and inference times. The abstract claims the model

Code & Models

Repositories

realwenlongwang/drama
pytorchOfficial

Videos

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications

MethodsSparse Evolutionary Training · Mamba: Linear-Time Sequence Modeling with Selective State Spaces