Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

Yanming Li; Xuelin Zhang; WenJie Lu; Ziye Tang; Maodong Wu; Haotian Luo; Tongtong Wu; Zijie Peng; Hongze Mi; Yibo Feng; Naiqiang Tan; Chao Huang; Hong Chen; Li Shen

arXiv:2602.08335·cs.AI·February 10, 2026

Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

Yanming Li, Xuelin Zhang, WenJie Lu, Ziye Tang, Maodong Wu, Haotian Luo, Tongtong Wu, Zijie Peng, Hongze Mi, Yibo Feng, Naiqiang Tan, Chao Huang, Hong Chen, Li Shen

PDF

Open Access

TL;DR

SHARP introduces a Shapley-based credit assignment framework for multi-agent reinforcement learning, improving training stability and performance by accurately attributing individual contributions in complex multi-agent systems.

Contribution

The paper proposes SHARP, a novel Shapley-based hierarchical attribution method that enhances credit assignment and training stability in multi-agent reinforcement learning.

Findings

01

SHARP outperforms state-of-the-art baselines by 23.66% and 14.05% in benchmark tasks.

02

The framework stabilizes training through normalized agent-specific advantages.

03

Extensive experiments validate SHARP's effectiveness across real-world benchmarks.

Abstract

Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications