Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
Daniel Lemire, Owen Kaser

TL;DR
This paper introduces hierarchical bin buffering algorithms that efficiently compute local moments over large external memory arrays, significantly reducing query times with minimal storage, applicable to various data analysis tasks.
Contribution
It presents novel hierarchical buffering techniques that improve query efficiency for local moments in large external memory arrays, using less storage than wavelet-based methods.
Findings
Query time reduced to O(sqrt n) with simple bin partitioning.
Hierarchical buffering achieves logarithmic query time.
Overlapped Bin Buffering uses minimal storage while maintaining efficiency.
Abstract
Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Advanced Data Storage Technologies
