Core-Elements for Large-Scale Least Squares Estimation
Mengyu Li, Jun Yu, Tao Li, Cheng Meng

TL;DR
This paper introduces a novel element-wise subset selection method called core-elements for large-scale least squares estimation, improving efficiency and robustness, especially with sparse data, and providing theoretical guarantees and superior empirical performance.
Contribution
The paper develops a deterministic core-elements algorithm for efficient, unbiased least squares estimation with theoretical bounds, addressing limitations of existing coreset methods in sparse settings.
Findings
Efficient $O(nnz(X)+rp^2)$ computational cost for large-scale data.
The estimator is unbiased and minimizes an upper bound of variance.
Demonstrates superior performance over existing methods in synthetic and real datasets.
Abstract
The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample and has found extensive applications in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome this limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an computational cost, where is an predictor matrix, is the number of elements selected from each column of , and denotes the number of non-zero elements. Theoretically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Statistical Methods and Inference · Advanced Statistical Methods and Models
