Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Bokun Wang; Axel Berg; Durmus Alp Emre Acar; Chuteng Zhou

arXiv:2407.02610·cs.LG·July 31, 2025

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Bokun Wang, Axel Berg, Durmus Alp Emre Acar, Chuteng Zhou

PDF

Open Access

TL;DR

This paper explores the use of 8-bit floating point training in federated learning, significantly reducing communication costs and enabling efficient on-device training while maintaining model accuracy.

Contribution

It introduces a novel FP8 federated learning method with convergence analysis and demonstrates substantial communication savings across multiple models and datasets.

Findings

01

Achieves at least 2.9x reduction in communication costs

02

Maintains comparable accuracy to FP32 baseline

03

Validates effectiveness across diverse models and datasets

Abstract

Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training neural networks with reduced computational cost compared to training in FP32/FP16. In this work, we investigate the use of FP8 training in a federated learning context. This approach brings not only the usual benefits of FP8 which are desirable for on-device training at the edge, but also reduces client-server communication costs due to significant weight compression. We present a novel method for combining FP8 client training while maintaining a global FP32 server model and provide convergence analysis. Experiments with various machine learning models and datasets show that our method consistently yields communication reductions of at least 2.9x across a variety of tasks and models compared to an FP32 baseline to achieve the same trained model accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques