MAFE: Enabling Equitable Algorithm Design in Multi-Agent Multi-Stage Decision-Making Systems

Zachary McBride Lazri; Anirudh Nakra; Ivan Brugere; Danial Dervovic; Antigoni Polychroniadou; Furong Huang; Dana Dachman-Soled; Min Wu

arXiv:2502.18534·cs.MA·February 10, 2026

MAFE: Enabling Equitable Algorithm Design in Multi-Agent Multi-Stage Decision-Making Systems

Zachary McBride Lazri, Anirudh Nakra, Ivan Brugere, Danial Dervovic, Antigoni Polychroniadou, Furong Huang, Dana Dachman-Soled, Min Wu

PDF

Open Access 3 Reviews

TL;DR

MAFE introduces a suite of multi-agent environments to evaluate and develop fairness-aware algorithms in complex, dynamic decision-making systems involving multiple interacting entities over time.

Contribution

It presents MAFE, a modular, open-source platform for simulating multi-agent systems to study fairness, supporting diverse domains and enabling reproducible research.

Findings

01

MAFE enables evaluation of fairness in multi-agent settings.

02

Experiments reveal trade-offs between fairness, performance, and coordination.

03

The platform supports diverse, realistic social system simulations.

Abstract

Algorithmic fairness is often studied in static or single-agent settings, yet many real-world decision-making systems involve multiple interacting entities whose multi-stage actions jointly influence long-term outcomes. Existing fairness methods applied at isolated decision points frequently fail to mitigate disparities that accumulate over time. Although recent work has modeled fairness as a sequential decision-making problem, it typically assumes centralized agents or simplified dynamics, limiting its applicability to complex social systems. We introduce MAFE, a suite of Multi-Agent Fair Environments designed to simulate realistic, modular, and dynamic systems in which fairness emerges from the interplay of multiple agents. We demonstrate MAFEs across three domains -- loan processing, healthcare, and higher education -- that support heterogeneous agents, configurable interventions,…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- Provides a novel environment for studying fairness in heterogenous multi-agent sequential systems. Their component functions for observables provides a flexible design for reward and fairness computations. - The three provided environments are reflecting realistic scenarios and structures. - Experiments show that even siimple interventions on policies improve fairness performance over time.

Weaknesses

- The setting is limited to cooperative settings, where all the agents need to consider the same objective function. The current setting omits competitive or semi-cooperative dynamics that occur in real systems. - Only basic interventions are provided as baselines. While the environments show fairness-utility trade-offs through interventions, these are relatively simple and deterministic policies. The framework would be more interesting if it supported adaptive or policy-driven interventions tha

Reviewer 02Rating 2Confidence 5

Strengths

1. I acknowledge that there is currently a gap in the community for more sophisticated benchmarks for measuring fairness in dynamic environments, and this paper addresses that need. 2. The modeling details, particularly those based on real data, represent substantial extensions over existing benchmarks such as D’Amour et al.

Weaknesses

1. The proposed algorithm is a straightforward adaptation of the cross-entropy method, and the environments are not benchmarked against existing fairness-aware RL algorithms cited by the authors. While algorithmic novelty is not essential for a benchmark paper, I believe the environments should still be evaluated using state-of-the-art fairness-aware RL algorithms to better demonstrate their utility. However, such comparison is completely missing from the paper and the environments are benchmark

Reviewer 03Rating 6Confidence 3

Strengths

* The study of long-term fairness in multi-agent sequential decision making is an important topic. * The proposed domains use real-world data as part of the simulation. * The empirical results look good using their proposed method, and the benchmark methodology seems to make sense.

Weaknesses

* The paper advertises a benchmark that promotes reproducibility and compatibility with standard MARL libraries. However, I do not see any experiments showcasing this. There are F-MAPPO and F-MADDPG, but the authors do not specify whether these come from off-the-shelf libraries. I think the paper would benefit from including comparisons of more baseline off-the-shelf MARL algorithms. * While the use of real-world datasets to instantiate the MAFEs is nice, there is limited discussion on how reali

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics