Fairness-Aware Structured Pruning in Transformers
Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Ioana Baldini,, Sarath Chandar

TL;DR
This paper introduces a fairness-aware structured pruning method for transformer models that reduces bias towards diverse groups while maintaining performance, without requiring fine-tuning.
Contribution
It proposes a novel pruning technique targeting attention heads that affect fairness, improving model fairness with minimal performance loss and resource use.
Findings
Reduces gender bias by up to 39.5% in various models.
Maintains near-original performance after pruning.
Does not require fine-tuning of the pruned models.
Abstract
The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based language models. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Softmax · Attention Dropout · Linear Layer · Multi-Head Attention · Pruning · Dense Connections · Linear Warmup With Cosine Annealing
