Toward Storage-Aware Learning with Compressed Data An Empirical Exploratory Study on JPEG

Kichang Lee; Songkuk Kim; JaeYeon Park; JeongGil Ko

arXiv:2508.12833·cs.LG·December 24, 2025

Toward Storage-Aware Learning with Compressed Data An Empirical Exploratory Study on JPEG

Kichang Lee, Songkuk Kim, JaeYeon Park, JeongGil Ko

PDF

Open Access

TL;DR

This paper empirically investigates how data compression affects on-device machine learning, revealing that adaptive, sample-wise compression strategies outperform naive methods and can optimize storage without significantly degrading model performance.

Contribution

It systematically characterizes storage-aware learning challenges and demonstrates the potential of adaptive compression strategies based on data sensitivity.

Findings

01

Naive compression strategies are suboptimal.

02

Data samples vary in sensitivity to compression.

03

Adaptive compression can improve storage efficiency.

Abstract

On-device machine learning is often constrained by limited storage, particularly in continuous data collection scenarios. This paper presents an empirical study on storage-aware learning, focusing on the trade-off between data quantity and quality via compression. We demonstrate that naive strategies, such as uniform data dropping or one-size-fits-all compression, are suboptimal. Our findings further reveal that data samples exhibit varying sensitivities to compression, supporting the feasibility of a sample-wise adaptive compression strategy. These insights provide a foundation for developing a new class of storage-aware learning systems. The primary contribution of this work is the systematic characterization of this under-explored challenge, offering valuable insights that advance the understanding of storage-aware learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Stochastic Gradient Optimization Techniques