LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

Yinuo Wang; Gavin Tao

arXiv:2508.11849·cs.RO·December 16, 2025

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

Yinuo Wang, Gavin Tao

PDF

TL;DR

LocoMamba is a novel vision-driven deep reinforcement learning framework that efficiently models long-range dependencies and improves locomotion performance in complex environments by leveraging Mamba layers and end-to-end training.

Contribution

The paper introduces LocoMamba, integrating Mamba layers into a vision-based DRL framework for faster, more robust training and better generalization in locomotion tasks.

Findings

01

Achieves higher success rates and fewer collisions than baselines.

02

Converges faster with fewer training updates.

03

Generalizes well to unseen terrains and obstacle densities.

Abstract

We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.