Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning
Andrea Baisero, Rupali Bhati, Shuo Liu, Aathira Pillai, Christopher Amato

TL;DR
This paper introduces QFIX, a new family of value function decomposition models for cooperative multi-agent reinforcement learning that enhances representation capabilities, improves stability, and outperforms existing methods like QPLEX.
Contribution
The paper presents a simple formulation of the full class of IGM values and derives QFIX, expanding prior models with a minimal fixing layer for better performance and stability.
Findings
QFIX improves performance over prior methods.
QFIX learns more stably than QPLEX.
QFIX uses simpler, smaller models.
Abstract
Value function decomposition methods for cooperative multi-agent reinforcement learning compose joint values from individual per-agent utilities, and train them using a joint objective. To ensure that the action selection process between individual utilities and joint values remains consistent, it is imperative for the composition to satisfy the individual-global max (IGM) property. Although satisfying IGM itself is straightforward, most existing methods (e.g., VDN, QMIX) have limited representation capabilities and are unable to represent the full class of IGM values, and the one exception that has no such limitation (QPLEX) is unnecessarily complex. In this work, we present a simple formulation of the full class of IGM values that naturally leads to the derivation of QFIX, a novel family of value function decomposition models that expand the representation capabilities of prior models…
Peer Reviews
Decision·Submitted to ICLR 2026
- I find the theoretical contribution sound and ample. The proposal of QIGM is compact and complete with proof constructive. I find most of the theoretical derivation correct. - The measurable-UAT framing is accurate and improves the rigorisness of IGM-complete in literature. - The fixing intervention that empirically stabilizes and improves performance verified in the experiments. - The experiment design is carefully designed with comprehensive benchmarks and representative baselines for the
- The measurable-UAT claim is correct but depends on appendix-level assumptions. A short brief mention would complete the main text during network family explanation and convergence. - Missing some of the SOTA baselines like MAPPO or more recent works, but given this experiment design is more on verifying the theoretical contribution I think this is acceptable to only compare with QMIX and QPLEX. - This paper could also benefit from simplifying the key assumption and findings and leave most of
This paper is well written and easy to read. The experiments provided are quite extensive and the method has strong theoretical groundings. The authors also aim to address specific limtiations of other previous methods, which is good. However, there are some points that are a bit less clear. Please find below.
Overall, I am unsure about the strength of the motivations for the proposed method; the proposed method is based on a structure that contains all functions that satisfy IGM, but other methods such as QTRAN can theoretically factorise any function which means that all functions that satisfy IGM are theoretically also included in that set of functions. QPLEX also has a strong representational complexity and the argument that it is "unnecessarily complex" does not sound convincing enough as a motiv
1. The paper provides a clean, minimal characterization of the IGM-complete function class and highlights a core mechanism (weighted transformation of advantages) without complex transformation stacks. 2. QFIX uses a “fixing” layer over existing factorizations, requiring small architectural changes. 3. Benchmarks across SMACv2 (9 scenarios, multiple races/sizes) and Overcooked; includes stability and model-size comparisons. The proposed method exhibits better convergence stability than QPLEX and
1. Comparisons only within value-decomposition class (no MAPPO, MADDPG variants). 2. Some practical tricks (advantage detach, annealing) feel heuristic, and they seem to lack a limited theoretical justification. 3. Limited analysis of when fixing hurts performance.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
