Federated Learning for Inference at Anytime and Anywhere
Zicheng Liu, Da Li, Javier Fernandez-Marques, Stefanos Laskaridis, Yan, Gao, {\L}ukasz Dudziak, Stan Z. Li, Shell Xu Hu, Timothy Hospedales

TL;DR
This paper introduces a novel federated learning framework that adapts pre-trained Transformer models using attention-based adapters, enabling efficient, personalized, and scalable inference across heterogeneous devices.
Contribution
Proposes a new method to adapt pre-trained Transformers in federated learning using attention-based adapters, improving efficiency, personalization, and scalability.
Findings
Fast and communication-efficient training with heterogeneous data.
Supports diverse device capabilities for inference.
Achieves accurate and scalable federated learning results.
Abstract
Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Adam · Softmax · Layer Normalization · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adapter
