Breaking Down Bias: On The Limits of Generalizable Pruning Strategies
Sibo Ma, Alejandro Salinas, Peter Henderson, Julian Nyarko

TL;DR
This paper investigates the limits of pruning strategies to mitigate racial biases in large language models, revealing partial bias representation and context-specific challenges that limit generalizability.
Contribution
It demonstrates that neuron-based pruning can reduce bias effectively but faces significant limitations in generalizing across different contexts and bias types.
Findings
Pruning can reduce bias without increasing anomalous behavior.
Neuron-based pruning outperforms head-pruning in bias mitigation.
Generalization of pruning strategies across contexts is limited.
Abstract
We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making
MethodsSoftmax · Attention Is All You Need · Pruning
