TL;DR
Histogram sort with sampling (HSS) is a new parallel sorting algorithm that reduces data movement and communication costs by combining sampling and iterative histogramming, achieving high performance.
Contribution
HSS introduces a novel combination of sampling and iterative histogramming that significantly reduces communication compared to existing algorithms.
Findings
HSS requires Θ(log(p)/log log(p)) less communication than recent algorithms.
HSS outperforms standard Sample sort and Histogram sort variants in practical benchmarks.
Application studies show the effectiveness of HSS in real-world scenarios.
Abstract
To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partition. We introduce Histogram sort with sampling (HSS), which combines sampling and iterative histogramming to find high quality partitions with minimal data movement and high practical performance. Compared to the best known (recently introduced) algorithm for finding these partitions, our algorithm requires a factor of {\Theta}(log(p)/ log log(p)) less communication, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed memory implementation of the proposed algorithm, compare its performance to two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
