Towards FairRAG: Preventing Representational Harm in Retrieval-Augmented Generation by Enforcing Fair Exposure at Retrieval Time
Riddhi Tikoo

TL;DR
This paper investigates bias in retrieval-augmented generation systems and introduces a novel ranking strategy to promote fairness in exposure and reduce representational harm.
Contribution
It proposes a new exposure-aware ranking method, Representative Stochastic, to mitigate bias in RAG systems and demonstrates its effectiveness using the TREC 2022 dataset.
Findings
Representative Stochastic ranker achieves near-parity exposure.
Relevance scores are influenced by initial representational bias.
Generation demographic parity reflects exposure fairness.
Abstract
As Large Language Model (LLM) integration has accelerated in high-stakes domains, model hallucination is a critical issue. Retrieval-augmented generation (RAG) is a technique for addressing hallucination; however, RAG's multi-component pipeline introduces vulnerabilities where biases can be introduced. This study considers two previously developed utility-focused ranking strategies (Standard and Stochastic) alongside two proposed exposure-aware approaches (Forced-Exposure and Representative Stochastic). Using the TREC 2022 Fair Ranking Dataset, which contains Wikipedia articles annotated as protected or non-protected, the LLM was asked to identify relevant articles with citations for four scenario-based Q&A prompts. The retrieval rankings and the generated outputs were evaluated for exposure bias and utility across all ranking methods. Overall, the Representative Stochastic ranker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
