LLMs Are Not a Silver Bullet: A Case Study on Software Fairness

Xinyue Li; Sixuan Li; Ying Xiao; Jie M. Zhang; Zhou Yang; Xuanzhe Liu; Zhenpeng Chen

arXiv:2604.12640·cs.SE·April 15, 2026

LLMs Are Not a Silver Bullet: A Case Study on Software Fairness

Xinyue Li, Sixuan Li, Ying Xiao, Jie M. Zhang, Zhou Yang, Xuanzhe Liu, Zhenpeng Chen

PDF

TL;DR

This study compares ML and LLM methods for bias mitigation in software, finding ML methods generally outperform LLMs, which often rely on limited in-context learning and artificial evaluation settings.

Contribution

It provides a large-scale comparison showing LLMs do not surpass traditional ML methods for fairness, highlighting limitations of current LLM-based bias mitigation approaches.

Findings

01

ML methods outperform LLMs in fairness and accuracy

02

Prior LLM studies' gains are due to artificial test data

03

Supervised fine-tuning of LLMs offers limited advantages

Abstract

Fairness is a critical requirement for human-related, high-stakes software systems, motivating extensive research on bias mitigation. Prior work has largely focused on tabular data settings using traditional Machine Learning (ML) methods. With the rapid rise of Large Language Models (LLMs), recent studies have begun to explore their use for bias mitigation in the same setting. However, it remains unclear whether LLM-based methods offer advantages over traditional ML methods, leaving software engineers without clear guidance for practical adoption. To address this gap, we present a large-scale study comparing state-of-the-art ML- and LLM-based bias mitigation methods. We find that ML-based methods consistently outperform LLM-based methods in both fairness and predictive performance, with even strong LLMs failing to surpass established ML baselines. To understand why prior LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.