COSAC: Counterfactual Credit Assignment in Sequential Cooperative Teams

Shripad Deshmukh; Jayakumar Subramanian; Raghavendra Addanki; Nikos Vlassis

arXiv:2604.17693·cs.LG·May 12, 2026

COSAC: Counterfactual Credit Assignment in Sequential Cooperative Teams

Shripad Deshmukh, Jayakumar Subramanian, Raghavendra Addanki, Nikos Vlassis

PDF

TL;DR

COSAC introduces a critic-free policy gradient method for sequential cooperative multi-agent systems, improving credit assignment efficiency and scalability while demonstrating superior performance on benchmark tasks.

Contribution

It proposes a novel additive reward decomposition and counterfactual advantage computation that extend the aristocrat utility to sequential teams, with theoretical bias-variance guarantees.

Findings

01

COSAC achieves lowest advantage MSE in sequential bandits.

02

It demonstrates faster convergence than critic-free baselines on the ARC task.

03

COSAC scales effectively to teams of up to 16 agents.

Abstract

In cooperative teams where agents act in a fixed order and share a single team-level reward (multi-agent language systems, sequential robotic tasks), per-agent credit assignment is under-determined. Critic-based approaches scale poorly as the number of agents grows owing to the costly maintenance of joint/factored critic(s), whereas the existing critic-free alternatives have other issues: common credit across agents that couples every agent's signal to teammate noise, importance-sampling corrections for upstream-update staleness that incur variance exponential in team size, or per-agent counterfactual replay that isolates each agent's effect at the price of extra environment or reward calls. We propose COSAC, a critic-free per-agent policy gradient for sequential cooperative teams. COSAC fits an additive per-agent decomposition of the team reward by a single ridge regression on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.