Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
Kangda Wei, Hasnat Md Abdullah, Ruihong Huang

TL;DR
This paper introduces a novel data generation framework that reduces gender bias in Large Language Models by fostering exploratory thinking through morally ambiguous story pairs and balanced judgments, improving fairness without sacrificing performance.
Contribution
The paper presents a new method using story pairs and moral judgment comparisons to mitigate gender bias in LLMs, enhancing fairness and model capabilities.
Findings
Significant reduction in gender bias observed.
Maintained or improved overall model performance.
Framework is effective and ready for release.
Abstract
Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios, then elicits and compares their moral judgments. When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments. These story-judgment pairs are used to fine-tune or optimize the models via Direct Preference Optimization (DPO). Experimental results show that our method significantly reduces gender bias while preserving or even enhancing general model capabilities. We will release the code and generated data. We release the code and generated data at:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
