Offline Evaluation Measures of Fairness in Recommender Systems
Theresia Veronika Rampisela

TL;DR
This paper critically analyzes existing fairness evaluation measures in recommender systems, revealing their limitations and proposing new approaches and guidelines for more reliable offline fairness assessment.
Contribution
It identifies key flaws in current measures, introduces novel evaluation methods, and offers practical guidelines for their appropriate application.
Findings
Identified interpretability and applicability issues in existing measures.
Proposed new evaluation approaches that address identified limitations.
Provided guidelines for selecting suitable fairness measures in practice.
Abstract
The evaluation of recommender system fairness has become increasingly important, especially with recent legislation that emphasises the development of fair and responsible artificial intelligence. This has led to the emergence of various fairness evaluation measures, which quantify fairness based on different definitions. However, many of such measures are simply proposed and used without further analysis on their robustness. As a result, there is insufficient understanding and awareness of the measures' limitations. Among other issues, it is not known what kind of model outputs produce the (un)fairest score, how the measure scores are empirically distributed, and whether there are cases where the measures cannot be computed (e.g., due to division by zero). These issues cause difficulty in interpreting the measure scores and confusion on which measure(s) should be used for a specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
