GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai, Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu

TL;DR
GenderCARE introduces a comprehensive framework with new benchmarks and debiasing techniques to assess and reduce gender bias in large language models, achieving significant bias reduction while maintaining performance.
Contribution
It presents a novel, flexible benchmark and debiasing methods that address limitations of previous approaches, including inclusivity of diverse gender groups.
Findings
Over 90% reduction in gender bias metrics
Average bias reduction above 35% across 17 LLMs
Minimal impact on language task performance (<2%)
Abstract
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare · Hate Speech and Cyberbullying Detection
