When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Michael Amir, Matteo Bettini, Amanda Prorok

TL;DR
This paper investigates the conditions under which diversity among agents in multi-agent systems leads to higher rewards, combining theoretical analysis with reinforcement learning experiments to identify when heterogeneity is beneficial.
Contribution
It provides a theoretical framework linking reward structure curvature to heterogeneity benefits and introduces HetGPS, a method to find scenarios where diversity improves outcomes in MARL.
Findings
Curvature of reward aggregation operators determines heterogeneity benefits.
Convexity of reward functions simplifies the assessment of diversity advantages.
HetGPS successfully identifies scenarios where heterogeneity enhances rewards.
Abstract
The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the agents' effort allocations on individual tasks to a task score, and an outer operator that merges the task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test.…
Peer Reviews
Decision·ICLR 2026 Poster
* The problem setting is interesting and relevant. The question of whether or not to share parameters in Multi-Agent RL is relevant and the analysis of aggregation functions in this paper provides a useful step towards answering it. * The theoretical analysis and its presentation are very clear and well explained * The experiments nicely confirm the theoretical predictions.
Minor points: * The assumption of normalized inner aggregators could be justified better. It's not entirely clear to me whether this is justified in practice
It is novel to formulate diverse reward allocation choices to a mathematical curvature question via Schur-convex/concave tools. The theorems/constructive counter-examples are clean with explicit assumptions. The algorithm description and experiment analysis are clearly presented. This work has significance in influencing environment/reward design and architecture choices in MARL.
1. Results hinge on symmetry/coordinate-wise monotonicity and near constant-sum task scores. It would be good to tabulate common benchmarks that violate these assumptions and provide bounds or heuristics for the reward difference when constant-sum fails. 2. Longer-horizon Dec-POMDP dynamics may interact with curvature in nontrivial ways; more systematic ablations or counterexamples would strengthen the claim 3. Figures all consist of 9 cases, making it difficult to distinguish the lines
1. This paper is well-written and easy to follow. The author provide sufficient supplementary material, making the conclusions of this paper clearer and more convincing. 2. This paper provides a rigorous, formal theory for predicting when behavioral heterogeneity is advantageous in multi-agent task allocation problems. This theoretical framework, based on the curvature of reward aggregation operators (Schur-convexity/concavity), moves the selection of diversity from ad-hoc heuristics to a princi
1. The core theoretical criterion for heterogeneity gain is based solely on the curvature of the reward function (Schur-convexity/concavity). This analysis is inherently restricted to the reward structure and does not formally integrate the complexity of environment dynamics. 2. The high efficiency and tractability of the HetGPS algorithm fundamentally rely on the assumption of an end-to-end differentiable simulator. This is required to compute the exact environment gradients via backpropagation
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI
