A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning
Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato

TL;DR
This paper investigates the theoretical properties of state-based critics in multi-agent reinforcement learning, revealing potential biases and variance issues, and empirically evaluates their impact across various benchmarks.
Contribution
It provides the first theoretical analysis of state-based critics, highlighting bias and variance issues, and empirically assesses their practical effects in multi-agent settings.
Findings
State-based critics can introduce bias in policy gradient estimates.
Using state-based critics can increase gradient variance.
Environmental properties influence the effectiveness of different critic types.
Abstract
Centralized Training for Decentralized Execution, where training is done in a centralized offline fashion, has become a popular solution paradigm in Multi-Agent Reinforcement Learning. Many such methods take the form of actor-critic with state-based critics, since centralized training allows access to the true system state, which can be useful during training despite not being available at execution time. State-based critics have become a common empirical choice, albeit one which has had limited theoretical justification or analysis. In this paper, we show that state-based critics can introduce bias in the policy gradient estimates, potentially undermining the asymptotic guarantees of the algorithm. We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition. Finally, we show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Reinforcement Learning in Robotics
