Detection, Classification, and Mitigation of Gender Bias in Large Language Models

Xiaoqing Cheng; Hongying Zan; Lulu Kong; Jinwang Song; Min Peng

arXiv:2506.12527·cs.CL·June 17, 2025

Detection, Classification, and Mitigation of Gender Bias in Large Language Models

Xiaoqing Cheng, Hongying Zan, Lulu Kong, Jinwang Song, Min Peng

PDF

Open Access

TL;DR

This paper presents a comprehensive approach to detect, classify, and mitigate gender bias in large language models using reinforcement learning, chain-of-thought reasoning, and supervised fine-tuning, achieving top rankings in a shared task.

Contribution

It introduces novel methods combining reinforcement learning, CoT reasoning, and fine-tuning with GPT-4 annotations to effectively reduce gender bias in LLMs.

Findings

01

Ranked first in all three subtasks of the shared task.

02

Enhanced bias detection accuracy through multi-step reasoning.

03

Effective bias mitigation via DPO with GPT-4 annotated data.

Abstract

With the rapid development of large language models (LLMs), they have significantly improved efficiency across a wide range of domains. However, recent studies have revealed that LLMs often exhibit gender bias, leading to serious social implications. Detecting, classifying, and mitigating gender bias in LLMs has therefore become a critical research focus. In the NLPCC 2025 Shared Task 7: Chinese Corpus for Gender Bias Detection, Classification and Mitigation Challenge, we investigate how to enhance the capabilities of LLMs in gender bias detection, classification, and mitigation. We adopt reinforcement learning, chain-of-thoughts (CoT) reasoning, and supervised fine-tuning to handle different Subtasks. Specifically, for Subtasks 1 and 2, we leverage the internal reasoning capabilities of LLMs to guide multi-step thinking in a staged manner, which simplifies complex biased queries and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection