Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection

Yeshwanth Venkatesha; Souvik Kundu; Priyadarshini Panda

arXiv:2506.00743·cs.CL·June 3, 2025

Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection

Yeshwanth Venkatesha, Souvik Kundu, Priyadarshini Panda

PDF

Open Access

TL;DR

This paper introduces a novel federated PEFT approach for large language models that employs head pruning, weighted aggregation, and client selection to reduce resource use and communication costs while maintaining accuracy.

Contribution

It presents a new method combining head pruning, weighted aggregation, and client selection to efficiently perform PEFT in federated learning for LLMs.

Findings

01

Achieves up to 90% sparsity with minimal accuracy loss

02

Reduces communication by up to 1.8x and training operations by 3.9x

03

Effective on multiple NLP benchmarks

Abstract

Parameter Efficient Fine-Tuning (PEFT) has become the de-facto approach in adapting Large Language Models (LLMs) for downstream tasks in Natural Language Processing. However, its adoption in privacy-preserving distributed learning frameworks, such as Federated Learning (FL), remains relatively limited. This is mainly due to challenges specific to FL, such as resource-constrained devices and diverse data distributions among clients. In this paper, we propose an efficient method to perform PEFT within the FL framework for Multi-Head Attention (MHA) based language models. We address the challenges through head pruning, a novel head-specific weighted aggregation mechanism, and a client selection strategy. Head pruning minimizes training complexity within the clients, guided by the importance score computed based on the confidence of the attention head. Weighted aggregation of heads ensures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Agent-Based Network Management · Peer-to-Peer Network Technologies

MethodsAttention Is All You Need · Linear Layer · Softmax · Pruning · Multi-Head Attention