Composable Sketches for Functions of Frequencies: Beyond the Worst Case

Edith Cohen; Ofir Geri; Rasmus Pagh

arXiv:2004.04772·cs.DS·November 4, 2021·5 cites

Composable Sketches for Functions of Frequencies: Beyond the Worst Case

Edith Cohen, Ofir Geri, Rasmus Pagh

PDF

Open Access 1 Video

TL;DR

This paper investigates constructing compact, composable sketches for frequency-based functions in data analytics, demonstrating that practical assumptions enable small sketches for complex functions beyond worst-case limitations.

Contribution

It introduces methods for efficient sketches under realistic assumptions, extending prior work on heavy hitters and frequency moments.

Findings

01

Small polylogarithmic sketches achieve accuracy for complex functions in practice.

02

Performance improves with noisy frequency advice or distributional assumptions.

03

Empirical results support theoretical findings.

Abstract

Recently there has been increased interest in using machine learning techniques to improve classical algorithms. In this paper we study when it is possible to construct compact, composable sketches for weighted sampling and statistics estimation according to functions of data frequencies. Such structures are now central components of large-scale data analytics and machine learning pipelines. However, many common functions, such as thresholds and p-th frequency moments with p > 2, are known to require polynomial-size sketches in the worst case. We explore performance beyond the worst case under two different types of assumptions. The first is having access to noisy advice on item frequencies. This continues the line of work of Hsu et al. (ICLR 2019), who assume predictions are provided by a machine learning model. The second is providing guaranteed performance on a restricted class of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Composable Sketches for Functions of Frequencies: Beyond the Worst Case· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression