Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed; Goncalo Mordido; Samira Shabanian; Ioana Baldini,; Sarath Chandar

arXiv:2312.15398·cs.CL·December 27, 2023·1 cites

Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Ioana Baldini,, Sarath Chandar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fairness-aware structured pruning method for transformer models that reduces bias towards diverse groups while maintaining performance, without requiring fine-tuning.

Contribution

It proposes a novel pruning technique targeting attention heads that affect fairness, improving model fairness with minimal performance loss and resource use.

Findings

01

Reduces gender bias by up to 39.5% in various models.

02

Maintains near-original performance after pruning.

03

Does not require fine-tuning of the pruned models.

Abstract

The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based language models. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandar-lab/fasp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Softmax · Attention Dropout · Linear Layer · Multi-Head Attention · Pruning · Dense Connections · Linear Warmup With Cosine Annealing