GPU sample sort
Nikolaj Leischner, Vitaly Osipov, Peter Sanders

TL;DR
This paper introduces a GPU-optimized sample sort algorithm that significantly outperforms existing GPU sorting methods in speed, robustness, and scalability across various key distributions and sizes.
Contribution
The paper presents the first efficient GPU implementation of sample sort, demonstrating superior performance over traditional GPU sorting algorithms for various data types and distributions.
Findings
Sample sort is at least 25% faster than GPU Thrust merge sort for uniform keys.
Sample sort is more than 2 times faster than GPU quicksort on average.
For 64-bit integer keys, sample sort is at least 63% faster than GPU Thrust radix sort.
Abstract
In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implementation is robust to different distributions and entropy levels of keys and scales almost linearly with the input size. These results indicate that multi-way techniques in general and sample sort in particular achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
