Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks

Xiumei Deng; Zehui Xiong; Binbin Chen; Dong In Kim; Merouane Debbah; H. Vincent Poor

arXiv:2511.02647·cs.DC·November 5, 2025

Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks

Xiumei Deng, Zehui Xiong, Binbin Chen, Dong In Kim, Merouane Debbah, H. Vincent Poor

PDF

Open Access

TL;DR

Federated Attention (FedAttn) introduces a distributed LLM inference framework that enhances privacy, reduces communication, and improves computational efficiency by integrating federated learning principles into self-attention mechanisms at the edge.

Contribution

This work presents FedAttn, a novel distributed inference method for LLMs that combines federated learning with self-attention, enabling privacy-preserving, efficient collaborative inference.

Findings

01

Theoretical analysis of error propagation in FedAttn.

02

Trade-off characterization between response quality and efficiency.

03

Experimental validation showing scalability and optimization opportunities.

Abstract

Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios. However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks. To address these, we propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism, creating a new distributed LLM inference framework that simultaneously achieves privacy protection, communication efficiency, and computational efficiency. FedAttn enables participants to perform local self-attention over their own token representations while periodically exchanging and aggregating Key-Value (KV) matrices across multiple Transformer blocks, collaboratively generating LLM responses without exposing private prompts. Further,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Big Data and Digital Economy