Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection
Yeshwanth Venkatesha, Souvik Kundu, Priyadarshini Panda

TL;DR
This paper introduces a novel federated PEFT approach for large language models that employs head pruning, weighted aggregation, and client selection to reduce resource use and communication costs while maintaining accuracy.
Contribution
It presents a new method combining head pruning, weighted aggregation, and client selection to efficiently perform PEFT in federated learning for LLMs.
Findings
Achieves up to 90% sparsity with minimal accuracy loss
Reduces communication by up to 1.8x and training operations by 3.9x
Effective on multiple NLP benchmarks
Abstract
Parameter Efficient Fine-Tuning (PEFT) has become the de-facto approach in adapting Large Language Models (LLMs) for downstream tasks in Natural Language Processing. However, its adoption in privacy-preserving distributed learning frameworks, such as Federated Learning (FL), remains relatively limited. This is mainly due to challenges specific to FL, such as resource-constrained devices and diverse data distributions among clients. In this paper, we propose an efficient method to perform PEFT within the FL framework for Multi-Head Attention (MHA) based language models. We address the challenges through head pruning, a novel head-specific weighted aggregation mechanism, and a client selection strategy. Head pruning minimizes training complexity within the clients, guided by the importance score computed based on the confidence of the attention head. Weighted aggregation of heads ensures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Peer-to-Peer Network Technologies
MethodsAttention Is All You Need · Linear Layer · Softmax · Pruning · Multi-Head Attention
