A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Elias Stehle; Hans-Arno Jacobsen

arXiv:1611.01137·cs.DB·May 22, 2017

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Elias Stehle, Hans-Arno Jacobsen

PDF

TL;DR

This paper introduces a memory bandwidth-efficient hybrid radix sort algorithm for GPUs that significantly reduces memory transfers, leading to substantial speed-ups in sorting large datasets compared to previous GPU and CPU methods.

Contribution

It presents a novel GPU radix sort approach that nearly halves memory transfers, boosting sorting performance, and extends it with a pipelined heterogeneous algorithm for larger or off-GPU data.

Findings

01

Achieves 2.32x faster sorting of 2GB data over state-of-the-art GPU radix sort.

02

Maintains at least 1.66x speed-up on skewed distributions.

03

Improves end-to-end sorting of 64GB data by over 2x compared to CPU-based radix sort.

Abstract

Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the first radix sort implementations that achieve a sorting rate of over one billion 32-bit keys per second. Yet, state-of-the-art approaches are heavily memory bandwidth-bound, as they require substantially more memory transfers than their CPU-based counterparts. Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation. Being able to sort two gigabytes of eight-byte records in as little as 50 milliseconds, our approach achieves a 2.32-fold…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.