Scalable Hash Table for NUMA Systems
Alok Tripathy, Oded Green

TL;DR
This paper introduces a multi-GPU hash table that achieves high throughput comparable to large CPU clusters, leveraging sparse-graph data structures and binning techniques for efficient parallel processing on individual nodes.
Contribution
The work presents a novel multi-GPU hash table algorithm that matches the performance of large CPU clusters, using advanced data structures and parallel techniques optimized for GPU architectures.
Findings
Processes 8 billion keys per second with 32-bit keys.
Achieves 4 times the speed of previous single-GPU implementations.
Performs comparably to 500-1,000-core CPU clusters.
Abstract
Hash tables are used in a plethora of applications, including database operations, DNA sequencing, string searching, and many more. As such, there are many parallelized hash tables targeting multicore, distributed, and accelerator-based systems. We present in this work a multi-GPU hash table implementation that can process keys at a throughput comparable to that of distributed hash tables. Distributed CPU hash tables have received significantly more attention than GPU-based hash tables. We show that a single node with multiple GPUs offers roughly the same performance as a 500-1,000-core CPU-based cluster. Our algorithm's key component is our use of multiple sparse-graph data structures and binning techniques to build the hash table. As has been shown individually, these components can be written with massive parallelism that is amenable to GPU acceleration. Since we focus on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Algorithms and Data Compression · Advanced Data Storage Technologies
