TL;DR
This paper introduces highly efficient in-place sorting algorithms, notably IPS$^4$o, that outperform existing methods across various data types, distributions, and hardware configurations by combining cache-efficient, parallel, and adaptive techniques.
Contribution
The paper presents a novel blockwise in-place data distribution technique and a robust, highly optimized comparison-based sorting algorithm, IPS$^4$o, that significantly outperforms prior in-place and non-in-place algorithms.
Findings
IPS$^4$o outperforms the best in-place parallel comparison sorts by nearly three times.
The new in-place radix sorter is optimal for many scenarios involving small keys or near-uniform distributions.
Extensive experiments confirm the robustness and superiority of the proposed algorithms across diverse settings.
Abstract
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our comparison-based algorithm, In-place Superscalar Samplesort (IPSo), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based competitor by almost a factor of three. IPSo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
