To Store or Not? Online Data Selection for Federated Learning with Limited Storage
Chen Gong, Zhenzhe Zheng, Yunfeng Shao, Bingshuai Li, Fan Wu, Guihai, Chen

TL;DR
This paper introduces an online data selection framework for federated learning with limited device storage, improving training speed and accuracy by filtering valuable data samples.
Contribution
It proposes a new data valuation metric with theoretical guarantees and the ODE framework for effective data selection in federated learning with storage constraints.
Findings
Achieves up to 2.5x faster training time
Increases inference accuracy by 6% on industrial data
Demonstrates robustness across various practical factors
Abstract
Machine learning models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management and intelligence on devices. To overcome high communication cost and severe privacy concerns of centralized machine learning, federated learning (FL) has been proposed to achieve distributed machine learning among networked devices. While the computation and communication limitation has been widely studied, the impact of on-device storage on the performance of FL is still not explored. Without an effective data selection policy to filter the massive streaming data on devices, classical FL can suffer from much longer model training time () and significant inference accuracy reduction (), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Age of Information Optimization
