Effective and Sparse Count-Sketch via k-means clustering
Yuhan Wang, Zijian Lei, Liang Lan

TL;DR
This paper introduces a novel count-sketch method that leverages k-means clustering to produce more accurate and sparser data sketches, improving machine learning performance and computational efficiency.
Contribution
The paper proposes a data-aware count-sketch approach using k-means clustering and gradient descent with L1 projection to enhance accuracy and sparsity.
Findings
Achieves higher classification accuracy than traditional count-sketch.
Produces sparser sketched matrices, reducing prediction costs.
Outperforms other matrix sketching algorithms on real datasets.
Abstract
Count-sketch is a popular matrix sketching algorithm that can produce a sketch of an input data matrix X in O(nnz(X))time where nnz(X) denotes the number of non-zero entries in X. The sketched matrix will be much smaller than X while preserving most of its properties. Therefore, count-sketch is widely used for addressing high-dimensionality challenge in machine learning. However, there are two main limitations of count-sketch: (1) The sketching matrix used count-sketch is generated randomly which does not consider any intrinsic data properties of X. This data-oblivious matrix sketching method could produce a bad sketched matrix which will result in low accuracy for subsequent machine learning tasks (e.g.classification); (2) For highly sparse input data, count-sketch could produce a dense sketched data matrix. This dense sketch matrix could make the subsequent machine learning tasks more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
Methodsk-Means Clustering
