Bias-Aware Sketches

Jiecao Chen; Qin Zhang

arXiv:1610.07718·cs.DS·March 28, 2017·1 cites

Bias-Aware Sketches

Jiecao Chen, Qin Zhang

PDF

Open Access

TL;DR

This paper introduces bias-aware linear sketching algorithms that improve error guarantees for large-scale data processing, especially when data exhibits bias, validated through theoretical proofs and extensive experiments.

Contribution

It proposes the first bias-aware linear sketches, providing rigorous error guarantees and demonstrating practical superiority over existing methods.

Findings

01

Bias-aware sketches outperform traditional sketches in biased datasets.

02

Theoretical proofs establish improved error bounds for the new sketches.

03

Experimental results confirm the practical effectiveness of bias-aware sketches.

Abstract

Linear sketching algorithms have been widely used for processing large-scale distributed and streaming datasets. Their popularity is largely due to the fact that linear sketches can be naturally composed in the distributed model and be efficiently updated in the streaming model. The errors of linear sketches are typically expressed in terms of the sum of coordinates of the input vector excluding those largest ones, or, the mass on the tail of the vector. Thus, the precondition for these algorithms to perform well is that the mass on the tail is small, which is, however, not always the case -- in many real-world datasets the coordinates of the input vector have a {\em bias}, which will generate a large mass on the tail. In this paper we propose linear sketches that are {\em bias-aware}. We rigorously prove that they achieve strictly better error guarantees than the corresponding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Advanced Image and Video Retrieval Techniques · Sparse and Compressive Sensing Techniques