Using Low-Discrepancy Points for Data Compression in Machine Learning:   An Experimental Comparison

Simone G\"ottlich; Jacob Heieck; Andreas Neuenkirch

arXiv:2407.07450·stat.ML·December 16, 2024

Using Low-Discrepancy Points for Data Compression in Machine Learning: An Experimental Comparison

Simone G\"ottlich, Jacob Heieck, Andreas Neuenkirch

PDF

Open Access

TL;DR

This paper compares two novel data compression methods using low-discrepancy points for neural network training, evaluating their effectiveness against existing clustering techniques through experimental analysis.

Contribution

Introduces and experimentally compares two new low-discrepancy point-based data compression methods for neural network training.

Findings

01

Both methods achieve comparable or better compression errors.

02

Voronoi clustering improves training accuracy in some cases.

03

Methods outperform traditional K-means-based supercompression in certain scenarios.

Abstract

Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms

Methodsk-Means Clustering