Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
Yue Xu, Chengyan Fu, Li Xiong, Sibei Yang, Wenjie Wang

TL;DR
This paper introduces FaIRMaker, an automated, model-independent framework that uses auto-search and refinement to generate instructions called Fairwords, effectively reducing gender bias in large language models without sacrificing task performance.
Contribution
FaIRMaker is a novel automated framework that adaptively generates and refines Fairwords to mitigate gender bias in LLMs, overcoming limitations of existing bias mitigation methods.
Findings
Effectively reduces gender bias in LLM responses.
Preserves task performance while mitigating bias.
Compatible with both API-based and open-source models.
Abstract
Pre-training large language models (LLMs) on vast text corpora enhances natural language processing capabilities but risks encoding social biases, particularly gender bias. While parameter-modification methods like fine-tuning mitigate bias, they are resource-intensive, unsuitable for closed-source models, and lack adaptability to evolving societal norms. Instruction-based approaches offer flexibility but often compromise task performance. To address these limitations, we propose , an automated and model-independent framework that employs an paradigm to adaptively generate Fairwords, which act as instructions integrated into input queries to reduce gender bias and enhance response quality. Extensive experiments demonstrate that FaIRMaker automatically searches for and dynamically refines Fairwords, effectively mitigating gender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
