Using Low-Discrepancy Points for Data Compression in Machine Learning: An Experimental Comparison
Simone G\"ottlich, Jacob Heieck, Andreas Neuenkirch

TL;DR
This paper compares two novel data compression methods using low-discrepancy points for neural network training, evaluating their effectiveness against existing clustering techniques through experimental analysis.
Contribution
Introduces and experimentally compares two new low-discrepancy point-based data compression methods for neural network training.
Findings
Both methods achieve comparable or better compression errors.
Voronoi clustering improves training accuracy in some cases.
Methods outperform traditional K-means-based supercompression in certain scenarios.
Abstract
Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms
Methodsk-Means Clustering
