Relative Bias: A Comparative Framework for Quantifying Bias in LLMs
Alireza Arbabi, Florian Kerschbaum

TL;DR
This paper introduces the Relative Bias framework, a systematic method for comparing biases across large language models by analyzing embedding transformations and using LLMs as evaluators, addressing the challenge of bias quantification.
Contribution
The paper presents a novel comparative framework for quantifying bias in LLMs, combining embedding analysis and LLM-based evaluation methods.
Findings
Strong correlation between the two bias scoring methods
Framework is systematic, scalable, and statistically grounded
Effective in bias and alignment case studies
Abstract
The growing deployment of large language models (LLMs) has amplified concerns regarding their inherent biases, raising critical questions about their fairness, safety, and societal impact. However, quantifying LLM bias remains a fundamental challenge, complicated by the ambiguity of what "bias" entails. This challenge grows as new models emerge rapidly and gain widespread use, while introducing potential biases that have not been systematically assessed. In this paper, we propose the Relative Bias framework, a method designed to assess how an LLM's behavior deviates from other LLMs within a specified target domain. We introduce two complementary methodologies: (1) Embedding Transformation analysis, which captures relative bias patterns through sentence representations over the embedding space, and (2) LLM-as-a-Judge, which employs a language model to evaluate outputs comparatively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
