Efficient Data Reduction Via PCA-Guided Quantile Based Sampling
Foo Hui-Mean, Yuan-chin Ivan Chang

TL;DR
This paper introduces PCA-QS, a novel data reduction method that combines PCA projection with quantile sampling to efficiently select representative data subsets, outperforming existing sampling techniques in accuracy and computational efficiency.
Contribution
The paper presents PCA-QS, a new sampling method that integrates PCA and quantile-based sampling, offering improved data reduction with lower error and better data preservation.
Findings
PCA-QS achieves lower mean squared error than existing methods.
PCA-QS is computationally efficient and adaptable to various data types.
PCA-QS better preserves key data characteristics compared to uniform and leverage score sampling.
Abstract
In large-scale statistical modeling, reducing data size through subsampling is essential for balancing computational efficiency and statistical accuracy. We propose a new method, Principal Component Analysis guided Quantile Sampling (PCA-QS), which projects data onto principal components and applies quantile-based sampling to retain representative and diverse subsets. Compared with uniform random sampling, leverage score sampling, and coreset methods, PCA-QS consistently achieves lower mean squared error and better preservation of key data characteristics, while also being computationally efficient. This approach is adaptable to a variety of data scenarios and shows strong potential for broad applications in statistical computing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
