Effective and Sparse Count-Sketch via k-means clustering

Yuhan Wang; Zijian Lei; Liang Lan

arXiv:2011.12046·cs.LG·November 30, 2020

Effective and Sparse Count-Sketch via k-means clustering

Yuhan Wang, Zijian Lei, Liang Lan

PDF

Open Access

TL;DR

This paper introduces a novel count-sketch method that leverages k-means clustering to produce more accurate and sparser data sketches, improving machine learning performance and computational efficiency.

Contribution

The paper proposes a data-aware count-sketch approach using k-means clustering and gradient descent with L1 projection to enhance accuracy and sparsity.

Findings

01

Achieves higher classification accuracy than traditional count-sketch.

02

Produces sparser sketched matrices, reducing prediction costs.

03

Outperforms other matrix sketching algorithms on real datasets.

Abstract

Count-sketch is a popular matrix sketching algorithm that can produce a sketch of an input data matrix X in O(nnz(X))time where nnz(X) denotes the number of non-zero entries in X. The sketched matrix will be much smaller than X while preserving most of its properties. Therefore, count-sketch is widely used for addressing high-dimensionality challenge in machine learning. However, there are two main limitations of count-sketch: (1) The sketching matrix used count-sketch is generated randomly which does not consider any intrinsic data properties of X. This data-oblivious matrix sketching method could produce a bad sketched matrix which will result in low accuracy for subsequent machine learning tasks (e.g.classification); (2) For highly sparse input data, count-sketch could produce a dense sketched data matrix. This dense sketch matrix could make the subsequent machine learning tasks more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition

Methodsk-Means Clustering