Multi-Agent Reinforcement Learning with Submodular Reward
Wenjing Chen, Chengyuan Qian, Shuo Xing, Yi Zhou, Victoria Crawford

TL;DR
This paper introduces a formal framework for cooperative multi-agent reinforcement learning with submodular rewards, providing algorithms with provable guarantees on efficiency and regret, addressing realistic scenarios with overlapping agent contributions.
Contribution
It is the first to formalize submodular reward structures in MARL and develop algorithms with theoretical guarantees for both known and unknown dynamics.
Findings
Greedy policy achieves 1/2-approximation with polynomial complexity.
UCB-based algorithm achieves 1/2-regret of O(H^2KS√ATS).
Addresses realistic scenarios with overlapping agent contributions.
Abstract
In this paper, we study cooperative multi-agent reinforcement learning (MARL) where the joint reward exhibits submodularity, which is a natural property capturing diminishing marginal returns when adding agents to a team. Unlike standard MARL with additive rewards, submodular rewards model realistic scenarios where agent contributions overlap (e.g., multi-drone surveillance, collaborative exploration). We provide the first formal framework for this setting and develop algorithms with provable guarantees on sample efficiency and regret bound. For known dynamics, our greedy policy optimization achieves a -approximation with polynomial complexity in the number of agents , overcoming the exponential curse of dimensionality inherent in joint policy optimization. For unknown dynamics, we propose a UCB-based learning algorithm achieving a -regret of over …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications
