Weighted Random Sampling on GPUs
Hans-Peter Lehmann, Lorenz H\"ubschle-Schneider, Peter Sanders

TL;DR
This paper adapts the PSA algorithm for alias table construction to GPUs, achieving significant speedups and energy efficiency improvements for weighted random sampling compared to CPU implementations.
Contribution
It introduces a GPU-based parallel construction algorithm for alias tables, with optimized memory access and search techniques, outperforming CPU methods in speed and energy efficiency.
Findings
17x faster alias table construction on GPU
Up to 24x higher sampling throughput
Multiple times less energy consumption
Abstract
An alias table is a data structure that allows for efficiently drawing weighted random samples in constant time and can be constructed in linear time. The PSA algorithm by H\"ubschle-Schneider and Sanders is able to construct alias tables in parallel on the CPU. In this report, we transfer the PSA algorithm to the GPU. Our construction algorithm achieves a speedup of 17 on a consumer GPU in comparison to the PSA method on a 16-core high-end desktop CPU. For sampling, we achieve an up to 24 times higher throughput. Both operations also require several times less energy than on the CPU. Adaptations helping to achieve this include changing memory access patterns to do coalesced access. Where this is not possible, we first copy data to the faster shared memory using coalesced access. We also enhance a generalization of binary search enabling to search for a range of items in parallel.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Management and Algorithms · Network Packet Processing and Optimization
