LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba
Yinuo Wang, Gavin Tao

TL;DR
LocoMamba is a novel vision-driven deep reinforcement learning framework that efficiently models long-range dependencies and improves locomotion performance in complex environments by leveraging Mamba layers and end-to-end training.
Contribution
The paper introduces LocoMamba, integrating Mamba layers into a vision-based DRL framework for faster, more robust training and better generalization in locomotion tasks.
Findings
Achieves higher success rates and fewer collisions than baselines.
Converges faster with fewer training updates.
Generalizes well to unseen terrains and obstacle densities.
Abstract
We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
