Data Compression using Rank-1 Lattices for Parameter Estimation in Machine Learning
Michael Gnewuch, Kumar Harsha, Marcin Wnuk

TL;DR
This paper introduces a data compression method using rank-1 lattices to accelerate loss function computations in large-scale machine learning, leveraging quasi-Monte Carlo point sets for efficient data reduction.
Contribution
It develops algorithms for data compression with rank-1 lattices tailored for fast loss calculations, extending prior work with error analysis and convergence guarantees for smooth functions.
Findings
Compression significantly speeds up loss calculations.
Error bounds depend on function smoothness and Fourier decay.
High convergence rates achievable for sufficiently smooth functions.
Abstract
The mean squared error and regularized versions of it are standard loss functions in supervised machine learning. However, calculating these losses for large data sets can be computationally demanding. Modifying an approach of J. Dick and M. Feischl [Journal of Complexity 67 (2021)], we present algorithms to reduce extensive data sets to a smaller size using rank-1 lattices. Rank-1 lattices are quasi-Monte Carlo (QMC) point sets that are, if carefully chosen, well-distributed in a multidimensional unit cube. The compression strategy in the preprocessing step assigns every lattice point a pair of weights depending on the original data and responses, representing its relative importance. As a result, the compressed data makes iterative loss calculations in optimization steps much faster. We analyze the errors of our QMC data compression algorithms and the cost of the preprocessing step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
