Partial Action Replacement: Tackling Distribution Shift in Offline MARL

Yue Jin; Giovanni Montana

arXiv:2511.07629·cs.LG·November 12, 2025

Partial Action Replacement: Tackling Distribution Shift in Offline MARL

Yue Jin, Giovanni Montana

PDF

Open Access 1 Video

TL;DR

This paper introduces Partial Action Replacement (PAR) and SPaCQL, a new offline MARL method that mitigates distribution shift by updating only parts of agents' actions, supported by theoretical bounds and empirical results.

Contribution

The paper proposes PAR and SPaCQL, a novel approach for offline MARL that reduces distribution shift under factorized policies, with theoretical guarantees and empirical validation.

Findings

01

PAR significantly reduces distribution shift compared to full joint-action updates.

02

SPaCQL outperforms baseline algorithms in datasets with independent agent actions.

03

Theoretical bounds show linear scaling of distribution shift with the number of deviating agents.

Abstract

Offline multi-agent reinforcement learning (MARL) is severely hampered by the challenge of evaluating out-of-distribution (OOD) joint actions. Our core finding is that when the behavior policy is factorized - a common scenario where agents act fully or partially independently during data collection - a strategy of partial action replacement (PAR) can significantly mitigate this challenge. PAR updates a single or part of agents' actions while the others remain fixed to the behavioral data, reducing distribution shift compared to full joint-action updates. Based on this insight, we develop Soft-Partial Conservative Q-Learning (SPaCQL), using PAR to mitigate OOD issue and dynamically weighting different PAR strategies based on the uncertainty of value estimation. We provide a rigorous theoretical foundation for this approach, proving that under factorized behavior policies, the induced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Partial Action Replacement: Tackling Distribution Shift in Offline MARL· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)