TL;DR
This paper introduces scalable distributed string sorting algorithms that efficiently handle large-scale parallel systems, significantly improving speed and scalability over existing methods.
Contribution
The paper presents practical distributed-memory string sorting algorithms with near-optimal latency and communication, enabling efficient sorting on thousands of cores.
Findings
Achieved up to 5x speedup over state-of-the-art algorithms.
Scales effectively on up to 49152 cores.
Latency proportional to p^{1/k} with limited communication rounds.
Abstract
String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least) proportional to the number of processors or communicate the data a large number of times (at least logarithmic). We present practical and efficient algorithms for distributed-memory string sorting that scale to large . Similar to state-of-the-art sorters for atomic objects, the algorithms have latency of about when allowing the data to be communicated times. Experiments indicate good scaling behavior on a wide range of inputs on up to 49152 cores. Overall, we achieve speedups of up to 5 over the current state-of-the-art distributed string sorting algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
