On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning

Aditya Akella

arXiv:2511.00034·cs.MA·November 4, 2025

On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning

Aditya Akella

PDF

Open Access 1 Video

TL;DR

This paper investigates the limitations of decentralized learnable reward shaping in cooperative multi-agent reinforcement learning, revealing fundamental barriers that prevent decentralized methods from matching centralized approaches in complex tasks.

Contribution

It introduces DMARL-RSA, a decentralized reward shaping system, and empirically demonstrates its limitations compared to centralized training, highlighting key challenges in decentralized multi-agent coordination.

Findings

01

Decentralized reward shaping underperforms centralized methods by over 26 points in average reward.

02

Decentralized methods achieve higher landmark coverage but worse overall task performance.

03

Three barriers identified: non-stationarity, credit assignment complexity, and reward-objective misalignment.

Abstract

Recent advances in learnable reward shaping have shown promise in single-agent reinforcement learning by automatically discovering effective feedback signals. However, the effectiveness of decentralized learnable reward shaping in cooperative multi-agent settings remains poorly understood. We propose DMARL-RSA, a fully decentralized system where each agent learns individual reward shaping, and evaluate it on cooperative navigation tasks in the simple_spread_v3 environment. Despite sophisticated reward learning, DMARL-RSA achieves only -24.20 +/- 0.09 average reward, compared to MAPPO with centralized training at 1.92 +/- 0.87 -- a 26.12-point gap. DMARL-RSA performs similarly to simple independent learning (IPPO: -23.19 +/- 0.96), indicating that advanced reward shaping cannot overcome fundamental decentralized coordination limitations. Interestingly, decentralized methods achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning