GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender   Bias in Large Language Models

Kunsheng Tang; Wenbo Zhou; Jie Zhang; Aishan Liu; Gelei Deng; Shuai; Li; Peigui Qi; Weiming Zhang; Tianwei Zhang; Nenghai Yu

arXiv:2408.12494·cs.CL·February 25, 2025

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai, Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu

PDF

Open Access 1 Repo

TL;DR

GenderCARE introduces a comprehensive framework with new benchmarks and debiasing techniques to assess and reduce gender bias in large language models, achieving significant bias reduction while maintaining performance.

Contribution

It presents a novel, flexible benchmark and debiasing methods that address limitations of previous approaches, including inclusivity of diverse gender groups.

Findings

01

Over 90% reduction in gender bias metrics

02

Average bias reduction above 35% across 17 LLMs

03

Minimal impact on language task performance (<2%)

Abstract

Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kstanghere/gendercare-ccs24
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare · Hate Speech and Cyberbullying Detection