When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Michael Amir; Matteo Bettini; Amanda Prorok

arXiv:2506.09434·cs.MA·March 3, 2026

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Michael Amir, Matteo Bettini, Amanda Prorok

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the conditions under which diversity among agents in multi-agent systems leads to higher rewards, combining theoretical analysis with reinforcement learning experiments to identify when heterogeneity is beneficial.

Contribution

It provides a theoretical framework linking reward structure curvature to heterogeneity benefits and introduces HetGPS, a method to find scenarios where diversity improves outcomes in MARL.

Findings

01

Curvature of reward aggregation operators determines heterogeneity benefits.

02

Convexity of reward functions simplifies the assessment of diversity advantages.

03

HetGPS successfully identifies scenarios where heterogeneity enhances rewards.

Abstract

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents' effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 10Confidence 3

Strengths

* The problem setting is interesting and relevant. The question of whether or not to share parameters in Multi-Agent RL is relevant and the analysis of aggregation functions in this paper provides a useful step towards answering it. * The theoretical analysis and its presentation are very clear and well explained * The experiments nicely confirm the theoretical predictions.

Weaknesses

Minor points: * The assumption of normalized inner aggregators could be justified better. It's not entirely clear to me whether this is justified in practice

Reviewer 02Rating 6Confidence 3

Strengths

It is novel to formulate diverse reward allocation choices to a mathematical curvature question via Schur-convex/concave tools. The theorems/constructive counter-examples are clean with explicit assumptions. The algorithm description and experiment analysis are clearly presented. This work has significance in influencing environment/reward design and architecture choices in MARL.

Weaknesses

1. Results hinge on symmetry/coordinate-wise monotonicity and near constant-sum task scores. It would be good to tabulate common benchmarks that violate these assumptions and provide bounds or heuristics for the reward difference when constant-sum fails. 2. Longer-horizon Dec-POMDP dynamics may interact with curvature in nontrivial ways; more systematic ablations or counterexamples would strengthen the claim 3. Figures all consist of 9 cases, making it difficult to distinguish the lines

Reviewer 03Rating 6Confidence 3

Strengths

1. This paper is well-written and easy to follow. The author provide sufficient supplementary material, making the conclusions of this paper clearer and more convincing. 2. This paper provides a rigorous, formal theory for predicting when behavioral heterogeneity is advantageous in multi-agent task allocation problems. This theoretical framework, based on the curvature of reward aggregation operators (Schur-convexity/concavity), moves the selection of diversity from ad-hoc heuristics to a princi

Weaknesses

1. The core theoretical criterion for heterogeneity gain is based solely on the curvature of the reward function (Schur-convexity/concavity). This analysis is inherently restricted to the reward structure and does not formally integrate the complexity of environment dynamics. 2. The high efficiency and tractability of the HetGPS algorithm fundamentally rely on the assumption of an end-to-end differentiable simulator. This is required to compute the exact environment gradients via backpropagation

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI