[Re] FairDICE: A Fair Tradeoff in Multi-objective Offline RL
Peter Adema, Karim Galliamov, Aleksey Evstratovskiy, Ross Geurts

TL;DR
FairDICE is a method for multi-objective offline RL that aims to automatically balance objectives, with the paper analyzing its theoretical claims and practical performance.
Contribution
The paper provides a replication study of FairDICE, identifying issues and demonstrating its potential when properly implemented and tuned.
Findings
Many theoretical claims of FairDICE hold.
A coding error reduced FairDICE to behavior cloning in some cases.
Proper hyperparameter tuning is crucial for FairDICE's performance.
Abstract
Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g. incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
