Scaling Multiagent Systems with Process Rewards

Ed Li; Junyu Ren; Cat Yan

arXiv:2601.23228·cs.AI·February 5, 2026

Scaling Multiagent Systems with Process Rewards

Ed Li, Junyu Ren, Cat Yan

PDF

Open Access

TL;DR

This paper introduces MAPPA, a method for finetuning multiagent systems using process rewards from AI feedback, improving performance on complex tasks by enabling fine-grained credit assignment and sample efficiency.

Contribution

The paper presents MAPPA, a novel approach that assigns per-action process rewards for multiagent finetuning, enhancing learning efficiency and effectiveness without ground truth labels.

Findings

01

Achieves +5.0--17.5pp on AIME math problems

02

Improves success rate by +16.7pp on data analysis tasks

03

Enhances performance across diverse multiagent domains

Abstract

While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (1) credit assignment across agents, and (2) sample efficiency of expensive multiagent rollouts. In this work, we propose finetuning multiagent systems with per-action process rewards from AI feedback (MAPPA) to address both. Through assigning credit to individual agent actions rather than only at task completion, MAPPA enables fine-grained supervision without ground truth labels while extracting maximal training signal from each rollout. We demonstrate our approach on competition math problems and tool-augmented data analysis tasks. On unseen math problems, MAPPA achieves +5.0--17.5pp on AIME and +7.8--17.2pp on AMC. For data analysis tasks, our method improves success rate by +16.7pp while quality metrics improve by up to 47%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning