Learning Partial Action Replacement in Offline MARL

Yue Jin; Giovanni Montana

arXiv:2603.28573·cs.LG·March 31, 2026

Learning Partial Action Replacement in Offline MARL

Yue Jin, Giovanni Montana

PDF

TL;DR

This paper introduces PLCQL, an adaptive, efficient framework for partial action replacement in offline multi-agent reinforcement learning, improving performance and reducing computational costs.

Contribution

PLCQL formulates PAR subset selection as a contextual bandit problem, enabling dynamic, state-dependent agent replacement with theoretical error bounds and improved efficiency.

Findings

01

PLCQL outperforms previous methods on multiple benchmarks.

02

It reduces Q-function evaluations from n to 1 per iteration.

03

Achieves highest scores on 66% of tasks across benchmarks.

Abstract

Offline multi-agent reinforcement learning (MARL) faces a critical challenge: the joint action space grows exponentially with the number of agents, making dataset coverage exponentially sparse and out-of-distribution (OOD) joint actions unavoidable. Partial Action Replacement (PAR) mitigates this by anchoring a subset of agents to dataset actions, but existing approach relies on enumerating multiple subset configurations at high computational cost and cannot adapt to varying states. We introduce PLCQL, a framework that formulates PAR subset selection as a contextual bandit problem and learns a state-dependent PAR policy using Proximal Policy Optimisation with an uncertainty-weighted reward. This adaptive policy dynamically determines how many agents to replace at each update step, balancing policy improvement against conservative value estimation. We prove a value-error bound showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.