Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Qi Lv, Xiang Deng, Gongwei Chen, Michael Yu Wang, Liqiang Nie

TL;DR
Decision Mamba introduces a multi-grained state space model with self-evolutionary regularization, effectively addressing out-of-distribution issues and overfitting in offline RL by leveraging historical information and local relationships.
Contribution
It proposes a novel multi-grained state space model with a self-evolving policy, explicitly modeling temporal and local relationships to improve offline RL performance.
Findings
Outperforms baseline methods on various tasks
Effectively handles noisy trajectories and overfitting
Enhances robustness with self-evolving policy
Abstract
While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among return-to-gos (RTGs), states, and actions, (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Simulation Techniques and Applications · Reinforcement Learning in Robotics
