Scaling Multiagent Systems with Process Rewards
Ed Li, Junyu Ren, Cat Yan

TL;DR
This paper introduces MAPPA, a method for finetuning multiagent systems using process rewards from AI feedback, improving performance on complex tasks by enabling fine-grained credit assignment and sample efficiency.
Contribution
The paper presents MAPPA, a novel approach that assigns per-action process rewards for multiagent finetuning, enhancing learning efficiency and effectiveness without ground truth labels.
Findings
Achieves +5.0--17.5pp on AIME math problems
Improves success rate by +16.7pp on data analysis tasks
Enhances performance across diverse multiagent domains
Abstract
While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (1) credit assignment across agents, and (2) sample efficiency of expensive multiagent rollouts. In this work, we propose finetuning multiagent systems with per-action process rewards from AI feedback (MAPPA) to address both. Through assigning credit to individual agent actions rather than only at task completion, MAPPA enables fine-grained supervision without ground truth labels while extracting maximal training signal from each rollout. We demonstrate our approach on competition math problems and tool-augmented data analysis tasks. On unseen math problems, MAPPA achieves +5.0--17.5pp on AIME and +7.8--17.2pp on AMC. For data analysis tasks, our method improves success rate by +16.7pp while quality metrics improve by up to 47%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
