Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement
Maryam Mousavian, Zahra Abbasiantaeb, Mohammad Aliannejadi, Fabio Crestani

TL;DR
This paper introduces a novel LLM-based approach and a new fairness metric, CWEx, to better detect, measure, and analyze gender bias in passage ranking systems, supported by annotated datasets and extensive experiments.
Contribution
It presents a new gender bias detection method using LLMs, a novel fairness metric CWEx, and annotated datasets to improve bias evaluation in IR systems.
Findings
CWEx offers more detailed fairness evaluation than previous metrics.
The proposed method aligns better with human judgments (58.77% Cohen's Kappa).
The approach effectively distinguishes gender bias in ranking models.
Abstract
The presence of social biases in Natural Language Processing (NLP) and Information Retrieval (IR) systems is an ongoing challenge, which underlines the importance of developing robust approaches to identifying and evaluating such biases. In this paper, we aim to address this issue by leveraging Large Language Models (LLMs) to detect and measure gender bias in passage ranking. Existing gender fairness metrics rely on lexical- and frequency-based measures, leading to various limitations, e.g., missing subtle gender disparities. Building on our LLM-based gender bias detection method, we introduce a novel gender fairness metric, named Class-wise Weighted Exposure (CWEx), aiming to address existing limitations. To measure the effectiveness of our proposed metric and study LLMs' effectiveness in detecting gender bias, we annotate a subset of the MS MARCO Passage Ranking collection and release…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
