Comparison of parallel sorting algorithms
Darko Bozidar, Tomaz Dobravec

TL;DR
This paper compares seven sequential and parallel sorting algorithms implemented on CPU and GPU, analyzing their performance across various input types and distributions to identify the most efficient methods.
Contribution
It provides a comprehensive comparison of seven sorting algorithms on CPU and GPU, including implementation improvements and performance analysis across diverse data types.
Findings
Parallel algorithms generally outperform sequential ones on GPU.
Performance varies significantly with input distribution and data size.
Certain algorithms are more suitable for specific data types and hardware.
Abstract
In our study we implemented and compared seven sequential and parallel sorting algorithms: bitonic sort, multistep bitonic sort, adaptive bitonic sort, merge sort, quicksort, radix sort and sample sort. Sequential algorithms were implemented on a central processing unit using C++, whereas parallel algorithms were implemented on a graphics processing unit using CUDA platform. We chose these algorithms because to the best of our knowledge their sequential and parallel implementations were not yet compared all together in the same execution environment. We improved the above mentioned implementations and adopted them to be able to sort input sequences of arbitrary length. We compared algorithms on six different input distributions, which consisted of 32-bit numbers, 32-bit key-value pairs, 64-bit numbers and 64-bit key-value pairs. In this report we give a short description of seven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Chaos-based Image/Signal Encryption · Parallel Computing and Optimization Techniques
