To Store or Not? Online Data Selection for Federated Learning with   Limited Storage

Chen Gong; Zhenzhe Zheng; Yunfeng Shao; Bingshuai Li; Fan Wu; Guihai; Chen

arXiv:2209.00195·cs.LG·February 28, 2023

To Store or Not? Online Data Selection for Federated Learning with Limited Storage

Chen Gong, Zhenzhe Zheng, Yunfeng Shao, Bingshuai Li, Fan Wu, Guihai, Chen

PDF

Open Access

TL;DR

This paper introduces an online data selection framework for federated learning with limited device storage, improving training speed and accuracy by filtering valuable data samples.

Contribution

It proposes a new data valuation metric with theoretical guarantees and the ODE framework for effective data selection in federated learning with storage constraints.

Findings

01

Achieves up to 2.5x faster training time

02

Increases inference accuracy by 6% on industrial data

03

Demonstrates robustness across various practical factors

Abstract

Machine learning models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management and intelligence on devices. To overcome high communication cost and severe privacy concerns of centralized machine learning, federated learning (FL) has been proposed to achieve distributed machine learning among networked devices. While the computation and communication limitation has been widely studied, the impact of on-device storage on the performance of FL is still not explored. Without an effective data selection policy to filter the massive streaming data on devices, classical FL can suffer from much longer model training time ( $4 \times$ ) and significant inference accuracy reduction ( $7%$ ), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Age of Information Optimization