PRIDE -- Parameter-Efficient Reduction of Identity Discrimination for Equality in LLMs
Maluna Menke, Thilo Hagendorff

TL;DR
This paper evaluates parameter-efficient fine-tuning methods, especially LoRA, to reduce gender and sexual identity biases in large language models, achieving significant fairness improvements with minimal additional parameters.
Contribution
It demonstrates that LoRA fine-tuning on a curated queer corpus effectively reduces bias in LLMs, offering a lightweight alternative to full-model fine-tuning.
Findings
LoRA reduces bias scores by up to 50 points.
Neutrality increases from near 0% to 36%.
Soft-prompt tuning shows marginal improvements.
Abstract
Large Language Models (LLMs) frequently reproduce the gender- and sexual-identity prejudices embedded in their training corpora, leading to outputs that marginalize LGBTQIA+ users. Hence, reducing such biases is of great importance. To achieve this, we evaluate two parameter-efficient fine-tuning (PEFT) techniques - Low-Rank Adaptation (LoRA) and soft-prompt tuning - as lightweight alternatives to full-model fine-tuning for mitigating such biases. Using the WinoQueer benchmark, we quantify bias in three open-source LLMs and observe baseline bias scores reaching up to 98 (out of 100) across a range of queer identities defined by gender and/or sexual orientation, where 50 would indicate neutrality. Fine-tuning with LoRA (< 0.1% additional parameters) on a curated QueerNews corpus reduces those scores by up to 50 points and raises neutrality from virtually 0% to as much as 36%. Soft-prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Hate Speech and Cyberbullying Detection · Topic Modeling
