Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback
Subham Pokhriyal, Shweta Jain, Vaneet Aggarwal

TL;DR
This paper introduces a multi-agent combinatorial bandit framework for the submodular welfare problem, providing the first regret guarantees under bandit feedback for partition-based utilities with shared constraints.
Contribution
It extends classical submodular welfare algorithms to a multi-agent bandit setting with shared constraints, proposing an explore-then-commit strategy with regret bounds.
Findings
Achieves $ ilde{O}(T^{2/3})$ regret bound.
First regret guarantee for partition-based submodular welfare under bandit feedback.
Framework handles coupled multi-agent allocation constraints.
Abstract
We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving regret against a benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Complexity and Algorithms in Graphs · Auction Theory and Applications
