Multi-Agent Reinforcement Learning with Submodular Reward

Wenjing Chen; Chengyuan Qian; Shuo Xing; Yi Zhou; Victoria Crawford

arXiv:2603.06810·cs.LG·March 10, 2026

Multi-Agent Reinforcement Learning with Submodular Reward

Wenjing Chen, Chengyuan Qian, Shuo Xing, Yi Zhou, Victoria Crawford

PDF

Open Access

TL;DR

This paper introduces a formal framework for cooperative multi-agent reinforcement learning with submodular rewards, providing algorithms with provable guarantees on efficiency and regret, addressing realistic scenarios with overlapping agent contributions.

Contribution

It is the first to formalize submodular reward structures in MARL and develop algorithms with theoretical guarantees for both known and unknown dynamics.

Findings

01

Greedy policy achieves 1/2-approximation with polynomial complexity.

02

UCB-based algorithm achieves 1/2-regret of O(H^2KS√ATS).

03

Addresses realistic scenarios with overlapping agent contributions.

Abstract

In this paper, we study cooperative multi-agent reinforcement learning (MARL) where the joint reward exhibits submodularity, which is a natural property capturing diminishing marginal returns when adding agents to a team. Unlike standard MARL with additive rewards, submodular rewards model realistic scenarios where agent contributions overlap (e.g., multi-drone surveillance, collaborative exploration). We provide the first formal framework for this setting and develop algorithms with provable guarantees on sample efficiency and regret bound. For known dynamics, our greedy policy optimization achieves a $1/2$ -approximation with polynomial complexity in the number of agents $K$ , overcoming the exponential curse of dimensionality inherent in joint policy optimization. For unknown dynamics, we propose a UCB-based learning algorithm achieving a $1/2$ -regret of $O (H^{2} K S A T)$ over $T$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications