An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation
Daiki Shirafuji, Tatsuhiko Saito, Yasutomo Kimura

TL;DR
This paper empirically compares seven model merging algorithms for social bias mitigation in large language models, revealing trade-offs between bias reduction and task performance, and identifying the most balanced methods.
Contribution
It provides the first comprehensive empirical evaluation of multiple model merging algorithms for bias mitigation across diverse LLMs and datasets.
Findings
SLERP at moderate weights offers the best bias-performance balance
Bias mitigation often reduces accuracy on reading and reasoning tasks
Linear, SLERP, and Nearswap are consistently effective in bias reduction
Abstract
Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families. We perform a comprehensive evaluation using three bias datasets (BBQ, BOLD, and HONEST) and measure the impact of these techniques on LLM performance in downstream tasks of the SuperGLUE benchmark. We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy, particularly on tasks requiring reading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
