Adaptive Sampling for Rapidly Matching Histograms

Stephen Macke; Yiming Zhang; Silu Huang; Aditya Parameswaran

arXiv:1708.05918·cs.DB·May 9, 2018

Adaptive Sampling for Rapidly Matching Histograms

Stephen Macke, Yiming Zhang, Silu Huang, Aditya Parameswaran

PDF

TL;DR

FastMatch is an interactive system that rapidly retrieves histograms similar to a target by using a novel probabilistic sampling algorithm, significantly reducing computation time while maintaining high accuracy.

Contribution

The paper introduces HistSim, a theoretically sound sampling algorithm, and integrates it into FastMatch for efficient, accurate histogram similarity retrieval.

Findings

01

Achieves up to 35x speedup over non-sampling methods.

02

Maintains near-perfect accuracy in identifying similar histograms.

03

Effective on real-world datasets.

Abstract

In exploratory data analysis, analysts often have a need to identify histograms that possess a specific distribution, among a large class of candidate histograms, e.g., find countries whose income distribution is most similar to that of Greece. This distribution could be a new one that the user is curious about, or a known distribution from an existing histogram visualization. At present, this process of identification is brute-force, requiring the manual generation and evaluation of a large number of histograms. We present FastMatch: an end-to-end approach for interactively retrieving the histogram visualizations most similar to a user-specified target, from a large collection of histograms. The primary technical contribution underlying FastMatch is a probabilistic algorithm, HistSim, a theoretically sound sampling-based approach to identify the top- $k$ closest histograms under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.