The Prospect of Enhancing Large-Scale Heterogeneous Federated Learning with Transformers
Yulan Gao, Zhaoxiang Hou, Chengyi Yang, Zengxiang Li, Han Yu

TL;DR
This paper explores the potential of Transformer-based models in large-scale heterogeneous federated learning, demonstrating their advantages over traditional neural networks through extensive experiments and analysis.
Contribution
It introduces the use of Transformers in federated learning for improved generalization and personalization in large-scale, heterogeneous data scenarios.
Findings
Transformers outperform ResNet in large-scale heterogeneous FL tasks.
Transformers show higher representation similarity across layers, indicating better feature extraction.
Experimental results validate the effectiveness of Transformer-based FL models.
Abstract
Federated learning (FL) addresses data privacy concerns by enabling collaborative training of AI models across distributed data owners. Wide adoption of FL faces the fundamental challenges of data heterogeneity and the large scale of data owners involved. In this paper, we investigate the prospect of Transformer-based FL models for achieving generalization and personalization in this setting. We conduct extensive comparative experiments involving FL with Transformers, ResNet, and personalized ResNet-based FL approaches under various scenarios. These experiments consider varying numbers of data owners to demonstrate Transformers' advantages over deep neural networks in large-scale heterogeneous FL tasks. In addition, we analyze the superior performance of Transformers by comparing the Centered Kernel Alignment (CKA) representation similarity across different layers and FL models to gain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Kaiming Initialization · Residual Connection · Bottleneck Residual Block · Average Pooling · Convolution · Max Pooling · Batch Normalization · Residual Block
