Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents

Chih-Hsuan Yang; Tanwi Mallick; Le Chen; Krishnan Raghavan; Azton Wells; Amal Gueroudji; Ian T. Foster; and Rajeev Thakur

arXiv:2511.10687·cs.MA·November 19, 2025

Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents

Chih-Hsuan Yang, Tanwi Mallick, Le Chen, Krishnan Raghavan, Azton Wells, Amal Gueroudji, Ian T. Foster, and Rajeev Thakur

PDF

Open Access

TL;DR

This paper introduces a theoretical framework that connects system-level evaluation with agent-level and message-level learning in multi-LLM systems, producing local, signed, and credit-conserving training signals to improve cooperation and fault localization.

Contribution

It presents a novel theoretical foundation unifying game-theoretic attribution with process reward modeling for multi-LLM training signals, enabling principled, local supervision from system evaluation.

Findings

01

Shapley-based credit assignment fairly allocates outcomes across agents.

02

Per-message rewards promote cooperation and discourage sabotage.

03

First-error localization aids in penalizing harmful steps and rewarding corrections.

Abstract

Large Language Models (LLMs) in multi-agent systems (MAS) have shown promise for complex tasks, yet current training methods lack principled ways to connect system-level evaluation with agent-level and message-level learning. We propose a theoretical framework that unifies cooperative game-theoretic attribution with process reward modeling to transform system evaluation into agent credit and then into response-level signals. Unlike prior approaches that rely only on attribution (e.g., Shapley) or step-level labels (e.g., PRM), our method produces local, signed, and credit-conserving signals. In success cases, Shapley-based credit assignment fairly allocates outcomes across agents and is refined into per-message rewards that promote cooperation while discouraging redundancy or sabotage. In failure cases, first-error localization yields repair-aware preferences that penalize harmful steps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education