Building Wavelet Histograms on Large Data in MapReduce
Jeffrey Jestes, Ke Yi, Feifei Li

TL;DR
This paper presents new algorithms for efficiently constructing wavelet histograms on large datasets using MapReduce, significantly improving performance over existing methods.
Contribution
The paper introduces novel algorithms for exact and approximate wavelet histogram construction optimized for MapReduce environments, demonstrating substantial efficiency gains.
Findings
Significant reduction in computation time and communication costs.
Order-of-magnitude performance improvements over baseline methods.
Effective implementation in Hadoop with large real and synthetic datasets.
Abstract
MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing data, and the wavelet histogram is one of the most widely used histograms. In this paper, we investigate the problem of building wavelet histograms efficiently on large datasets in MapReduce. We measure the efficiency of the algorithms by both end-to-end running time and communication cost. We demonstrate straightforward adaptations of existing exact and approximate methods for building wavelet histograms to MapReduce clusters are highly inefficient. To that end, we design new algorithms for computing exact and approximate wavelet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Data Management and Algorithms
