Do LLMs have a Gender (Entropy) Bias?
Sonal Prabhune, Balaji Padmanabhan, and Kaushik Dutta

TL;DR
This paper examines gender bias in large language models by introducing a new benchmark dataset and analyzing response entropy differences between genders, finding subtle biases at the question level and proposing an effective debiasing method.
Contribution
The paper introduces RealWorldQuestioning, a new benchmark dataset for gender bias analysis in LLMs, and proposes a simple prompt-based debiasing strategy to mitigate entropy bias.
Findings
No significant category-level gender bias detected.
Substantial question-level response differences exist between genders.
Debiasing method improves response balance and information content in 78% of cases.
Abstract
We investigate the existence and persistence of a specific type of gender bias in some of the popular LLMs and contribute a new benchmark dataset, RealWorldQuestioning (released on HuggingFace ), developed from real-world questions across four key domains in business and health contexts: education, jobs, personal financial management, and general health. We define and study entropy bias, which we define as a discrepancy in the amount of information generated by an LLM in response to real questions users have asked. We tested this using four different LLMs and evaluated the generated responses both qualitatively and quantitatively by using ChatGPT-4o (as "LLM-as-judge"). Our analyses (metric-based comparisons and "LLM-as-judge" evaluation) suggest that there is no significant bias in LLM responses for men and women at a category level. However, at a finer granularity (the individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property
