CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs
Stela Tong, Elai Ben-Gal

TL;DR
This paper introduces CoFi-PGMA, a unified reinforcement learning framework for multi-agent large language models that effectively handles filtered feedback in routing and collaborative systems.
Contribution
It develops a counterfactual policy gradient method that corrects learning signals under filtered feedback, applicable to both routing and collaborative multi-agent LLM architectures.
Findings
The method improves learning efficiency in multi-agent LLM systems.
Counterfactual estimators enable better credit assignment in filtered feedback scenarios.
Demonstrated effectiveness on a real-world reasoning dataset.
Abstract
Large language model (LLM) deployments increasingly rely on multi-agent architectures in which multiple models either compete through routing mechanisms or collaborate to produce a final answer. In both settings, the learning signal received by each agent is filtered by the system mechanism. Routing produces selection-gated feedback where only the chosen response is evaluated, while collaboration produces shared rewards that obscure the individual contribution of each agent. As a result, standard RLHF objectives designed for a single deployed policy become misspecified. We introduce CoFi-PGMA (Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs), a unified framework for learning under filtered feedback in multi-agent LLM systems. Our approach derives a counterfactual per-agent training objective based on marginal contribution, which corrects the learning signal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
