Detection, Classification, and Mitigation of Gender Bias in Large Language Models
Xiaoqing Cheng, Hongying Zan, Lulu Kong, Jinwang Song, Min Peng

TL;DR
This paper presents a comprehensive approach to detect, classify, and mitigate gender bias in large language models using reinforcement learning, chain-of-thought reasoning, and supervised fine-tuning, achieving top rankings in a shared task.
Contribution
It introduces novel methods combining reinforcement learning, CoT reasoning, and fine-tuning with GPT-4 annotations to effectively reduce gender bias in LLMs.
Findings
Ranked first in all three subtasks of the shared task.
Enhanced bias detection accuracy through multi-step reasoning.
Effective bias mitigation via DPO with GPT-4 annotated data.
Abstract
With the rapid development of large language models (LLMs), they have significantly improved efficiency across a wide range of domains. However, recent studies have revealed that LLMs often exhibit gender bias, leading to serious social implications. Detecting, classifying, and mitigating gender bias in LLMs has therefore become a critical research focus. In the NLPCC 2025 Shared Task 7: Chinese Corpus for Gender Bias Detection, Classification and Mitigation Challenge, we investigate how to enhance the capabilities of LLMs in gender bias detection, classification, and mitigation. We adopt reinforcement learning, chain-of-thoughts (CoT) reasoning, and supervised fine-tuning to handle different Subtasks. Specifically, for Subtasks 1 and 2, we leverage the internal reasoning capabilities of LLMs to guide multi-step thinking in a staged manner, which simplifies complex biased queries and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
