Partial Action Replacement: Tackling Distribution Shift in Offline MARL
Yue Jin, Giovanni Montana

TL;DR
This paper introduces Partial Action Replacement (PAR) and SPaCQL, a new offline MARL method that mitigates distribution shift by updating only parts of agents' actions, supported by theoretical bounds and empirical results.
Contribution
The paper proposes PAR and SPaCQL, a novel approach for offline MARL that reduces distribution shift under factorized policies, with theoretical guarantees and empirical validation.
Findings
PAR significantly reduces distribution shift compared to full joint-action updates.
SPaCQL outperforms baseline algorithms in datasets with independent agent actions.
Theoretical bounds show linear scaling of distribution shift with the number of deviating agents.
Abstract
Offline multi-agent reinforcement learning (MARL) is severely hampered by the challenge of evaluating out-of-distribution (OOD) joint actions. Our core finding is that when the behavior policy is factorized - a common scenario where agents act fully or partially independently during data collection - a strategy of partial action replacement (PAR) can significantly mitigate this challenge. PAR updates a single or part of agents' actions while the others remain fixed to the behavioral data, reducing distribution shift compared to full joint-action updates. Based on this insight, we develop Soft-Partial Conservative Q-Learning (SPaCQL), using PAR to mitigate OOD issue and dynamically weighting different PAR strategies based on the uncertainty of value estimation. We provide a rigorous theoretical foundation for this approach, proving that under factorized behavior policies, the induced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
