Fairness Aware Reinforcement Learning via Proximal Policy Optimization
Gabriele La Malfa, Jie M. Zhang, Michael Luck, Elizabeth Black

TL;DR
This paper proposes Fair-PPO, a reinforcement learning algorithm that incorporates fairness penalties based on demographic parity, balancing reward maximization with equitable outcomes in multi-agent systems.
Contribution
It introduces a novel fairness-aware PPO method with retrospective and prospective penalty components, improving fairness in multi-agent reinforcement learning scenarios.
Findings
Fair-PPO achieves fairer policies than standard PPO.
The method balances fairness with performance, maintaining efficiency.
It reveals diverse strategies to enhance fairness in multi-agent systems.
Abstract
Fairness in multi-agent systems (MAS) focuses on equitable reward distribution among agents in scenarios involving sensitive attributes such as race, gender, or socioeconomic status. This paper introduces fairness in Proximal Policy Optimization (PPO) with a penalty term derived from a fairness definition such as demographic parity, counterfactual fairness, or conditional statistical parity. The proposed method, which we call Fair-PPO, balances reward maximisation with fairness by integrating two penalty components: a retrospective component that minimises disparities in past outcomes and a prospective component that ensures fairness in future decision-making. We evaluate our approach in two games: the Allelopathic Harvest, a cooperative and competitive MAS focused on resource collection, where some agents possess a sensitive attribute, and HospitalSim, a hospital simulation, in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBehavioral Health and Interventions · Ethics and Social Impacts of AI · Human-Automation Interaction and Safety
MethodsEntropy Regularization · Mixing Adam and SGD · Proximal Policy Optimization
