LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

Zhanyue Qin; Yue Ding; Deyuan Liu; Qingbin Liu; Junxian Cai; Xi Chen; Zhiying Tu; Dianhui Chu; Cuiyun Gao; Dianbo Sui

arXiv:2505.15475·cs.CL·May 22, 2025

LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

Zhanyue Qin, Yue Ding, Deyuan Liu, Qingbin Liu, Junxian Cai, Xi Chen, Zhiying Tu, Dianhui Chu, Cuiyun Gao, Dianbo Sui

PDF

Open Access

TL;DR

This paper introduces datasets and metrics for measuring gender bias in large language models, and proposes the LFTF algorithm that effectively mitigates bias by targeting and fine-tuning the most relevant model blocks.

Contribution

The paper presents a novel bias evaluation framework with new datasets and metrics, and introduces the LFTF algorithm for targeted bias mitigation in LLMs.

Findings

01

LFTF significantly reduces gender bias in LLMs.

02

The proposed metrics effectively quantify gender bias and bias consistency.

03

LFTF maintains the general capabilities of LLMs after mitigation.

Abstract

Nowadays, Large Language Models (LLMs) have attracted widespread attention due to their powerful performance. However, due to the unavoidable exposure to socially biased data during training, LLMs tend to exhibit social biases, particularly gender bias. To better explore and quantifying the degree of gender bias in LLMs, we propose a pair of datasets named GenBiasEval and GenHintEval, respectively. The GenBiasEval is responsible for evaluating the degree of gender bias in LLMs, accompanied by an evaluation metric named AFGB-Score (Absolutely Fair Gender Bias Score). Meanwhile, the GenHintEval is used to assess whether LLMs can provide responses consistent with prompts that contain gender hints, along with the accompanying evaluation metric UB-Score (UnBias Score). Besides, in order to mitigate gender bias in LLMs more effectively, we present the LFTF (Locating First and Then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling

MethodsSoftmax · Attention Is All You Need