Federated Learning for Inference at Anytime and Anywhere

Zicheng Liu; Da Li; Javier Fernandez-Marques; Stefanos Laskaridis; Yan; Gao; {\L}ukasz Dudziak; Stan Z. Li; Shell Xu Hu; Timothy Hospedales

arXiv:2212.04084·cs.LG·December 9, 2022·1 cites

Federated Learning for Inference at Anytime and Anywhere

Zicheng Liu, Da Li, Javier Fernandez-Marques, Stefanos Laskaridis, Yan, Gao, {\L}ukasz Dudziak, Stan Z. Li, Shell Xu Hu, Timothy Hospedales

PDF

Open Access

TL;DR

This paper introduces a novel federated learning framework that adapts pre-trained Transformer models using attention-based adapters, enabling efficient, personalized, and scalable inference across heterogeneous devices.

Contribution

Proposes a new method to adapt pre-trained Transformers in federated learning using attention-based adapters, improving efficiency, personalization, and scalability.

Findings

01

Fast and communication-efficient training with heterogeneous data.

02

Supports diverse device capabilities for inference.

03

Achieves accurate and scalable federated learning results.

Abstract

Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Adam · Softmax · Layer Normalization · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adapter