Bandwidth-Optimal Random Shuffling for GPUs

Rory Mitchell; Daniel Stokes; Eibe Frank; Geoffrey Holmes

arXiv:2106.06161·cs.DC·February 4, 2022

Bandwidth-Optimal Random Shuffling for GPUs

Rory Mitchell, Daniel Stokes, Eibe Frank, Geoffrey Holmes

PDF

1 Repo

TL;DR

This paper introduces a GPU-optimized pseudo-random permutation algorithm called bijective shuffle, which minimizes global memory operations and significantly outperforms existing methods in speed and bandwidth utilization.

Contribution

The paper presents a novel parallel shuffling algorithm for GPUs that reduces memory transactions and improves efficiency compared to prior approaches.

Findings

01

Outperforms competing algorithms by 10-100x in speed

02

Approaches peak GPU bandwidth in experiments

03

Provides a statistical test for permutation quality

Abstract

Linear-time algorithms that are traditionally used to shuffle data on CPUs, such as the method of Fisher-Yates, are not well suited to implementation on GPUs due to inherent sequential dependencies, and existing parallel shuffling algorithms are unsuitable for GPU architectures because they incur a large number of read/write operations to high latency global memory. To address this, we provide a method of generating pseudo-random permutations in parallel by fusing suitable pseudo-random bijective functions with stream compaction operations. Our algorithm, termed `bijective shuffle' trades increased per-thread arithmetic operations for reduced global memory transactions. It is work-efficient, deterministic, and only requires a single global memory read and write per shuffle input, thus maximising use of global memory bandwidth. To empirically demonstrate the correctness of the algorithm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

djns99/CUDA-Shuffle
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.