A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
Chen Gong, Rui Xing, Zhenzhe Zheng, Fan Wu

TL;DR
This paper introduces Titan, a two-stage data selection framework that enhances data utilization and training efficiency on edge devices, leading to faster training and improved accuracy.
Contribution
The paper presents a novel two-stage data selection method with a theoretically optimal strategy for on-device model training, improving efficiency and accuracy.
Findings
Up to 43% reduction in training time
6.2% increase in final accuracy
Minor system overheads maintained
Abstract
The demand for machine learning (ML) model training on edge devices is escalating due to data privacy and personalized service needs. However, we observe that current on-device model training is hampered by the under-utilization of on-device data, due to low training throughput, limited storage and diverse data importance. To improve data resource utilization, we propose a two-stage data selection framework {\sf Titan} to select the most important data batch from streaming data for model training with guaranteed efficiency and effectiveness. Specifically, in the first stage, {\sf Titan} filters out a candidate dataset with potentially high importance in a coarse-grained manner.In the second stage of fine-grained selection, we propose a theoretically optimal data selection strategy to identify the data batch with the highest model performance improvement to current training round. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
