Data Compression using Rank-1 Lattices for Parameter Estimation in Machine Learning

Michael Gnewuch; Kumar Harsha; Marcin Wnuk

arXiv:2409.13453·math.NA·August 27, 2025

Data Compression using Rank-1 Lattices for Parameter Estimation in Machine Learning

Michael Gnewuch, Kumar Harsha, Marcin Wnuk

PDF

Open Access

TL;DR

This paper introduces a data compression method using rank-1 lattices to accelerate loss function computations in large-scale machine learning, leveraging quasi-Monte Carlo point sets for efficient data reduction.

Contribution

It develops algorithms for data compression with rank-1 lattices tailored for fast loss calculations, extending prior work with error analysis and convergence guarantees for smooth functions.

Findings

01

Compression significantly speeds up loss calculations.

02

Error bounds depend on function smoothness and Fourier decay.

03

High convergence rates achievable for sufficiently smooth functions.

Abstract

The mean squared error and regularized versions of it are standard loss functions in supervised machine learning. However, calculating these losses for large data sets can be computationally demanding. Modifying an approach of J. Dick and M. Feischl [Journal of Complexity 67 (2021)], we present algorithms to reduce extensive data sets to a smaller size using rank-1 lattices. Rank-1 lattices are quasi-Monte Carlo (QMC) point sets that are, if carefully chosen, well-distributed in a multidimensional unit cube. The compression strategy in the preprocessing step assigns every lattice point a pair of weights depending on the original data and responses, representing its relative importance. As a result, the compressed data makes iterative loss calculations in optimization steps much faster. We analyze the errors of our QMC data compression algorithms and the cost of the preprocessing step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques