M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit
Yukai Feng, Zhiheng Wu, Zhengxing Wu, Junwen Gu, Junzhi Yu

TL;DR
The paper introduces M$^{2}$GRPO, a novel multi-agent policy optimization framework for biomimetic underwater robots that enhances long-horizon decision making, inter-robot coordination, and stability, outperforming existing methods.
Contribution
It proposes a new Mamba-based group-relative policy optimization method that integrates attention mechanisms and reward normalization for scalable, stable multi-agent pursuit in underwater robots.
Findings
M$^{2}$GRPO outperforms MAPPO and recurrent baselines in pursuit success rate.
The method improves capture efficiency in simulated and real-world experiments.
It reduces training resource demands while maintaining stability and scalability.
Abstract
Traditional policy learning methods in cooperative pursuit face fundamental challenges in biomimetic underwater robots, where long-horizon decision making, partial observability, and inter-robot coordination require both expressiveness and stability. To address these issues, a novel framework called Mamba-based multi-agent group relative policy optimization (MGRPO) is proposed, which integrates a selective state-space Mamba policy with group-relative policy optimization under the centralized-training and decentralized-execution (CTDE) paradigm. Specifically, the Mamba-based policy leverages observation history to capture long-horizon temporal dependencies and exploits attention-based relational features to encode inter-agent interactions, producing bounded continuous actions through normalized Gaussian sampling. To further improve credit assignment without sacrificing stability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
