Practical Massively Parallel Sorting
Michael Axtmann, Timo Bingmann, Peter Sanders, and Christian Schulz

TL;DR
This paper introduces scalable parallel sorting algorithms that effectively balance communication and critical path length, demonstrating practical efficiency on large-scale machines.
Contribution
It presents multi-level generalizations of sample sort and multiway mergesort, along with new tools for small input sorting, constrained bin packing, and efficient data delivery.
Findings
Sample sort variant is highly scalable.
Algorithms perform well on large-scale machines.
New tools improve small input sorting and data delivery efficiency.
Abstract
Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular our sample sort variant turns out to be very scalable. Some tools we develop may be of independent interest -- a simple, practical, and flexible sorting algorithm for small inputs working in logarithmic time, a near linear time optimal algorithm for solving a constrained bin packing problem, and an algorithm for data delivery, that guarantees a small number of message startups on each processor.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Algorithms and Data Compression · DNA and Biological Computing
