Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix
Zhenwei Dai, Aditya Desai, Reinhard Heckel, Anshumali Shrivastava

TL;DR
This paper introduces Active Sampling Count Sketch (ASCS), an online algorithm that efficiently estimates large sparse covariance matrices of trillions of entries, improving accuracy over traditional methods by enhancing the signal-to-noise ratio.
Contribution
The paper presents a novel active sampling strategy integrated with Count Sketch for improved online sparse covariance estimation in high-dimensional data.
Findings
ASCS accurately recovers large covariance entries.
ASCS outperforms vanilla Count Sketch in noisy, high-dimensional settings.
Algorithm is effective on synthetic and real-world datasets.
Abstract
Estimating and storing the covariance (or correlation) matrix of high-dimensional data is computationally challenging because both memory and computational requirements scale quadratically with the dimension. Fortunately, high-dimensional covariance matrices as observed in text, click-through, meta-genomics datasets, etc are often sparse. In this paper, we consider the problem of efficient sparse estimation of covariance matrices with possibly trillions of entries. The size of the datasets we target requires the algorithm to be online, as more than one pass over the data is prohibitive. In this paper, we propose Active Sampling Count Sketch (ASCS), an online and one-pass sketching algorithm, that recovers the large entries of the covariance matrix accurately. Count Sketch (CS), and other sub-linear compressed sensing algorithms, offer a natural solution to the problem in theory.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Indoor and Outdoor Localization Technologies
