HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage
Haidong Rong, Jiashu Yao, Matthias Langer, Shijie Liu, Li Fan, Dongxin Wang, Jia He, Jinglin Chen, Jiaheng Rang, Julian Qian, Mengyao Xu, Fan Yu, Minseok Lee, Zehuan Wang, and Even Oldridge

TL;DR
HierarchicalKV introduces a cache-semantic GPU hash table that efficiently manages large embedding tables with eviction policies, achieving high throughput and scalability for online embedding storage.
Contribution
It presents HKV, the first GPU hash table with cache semantics, enabling in-place updates and eviction, improving scalability and performance over traditional dictionary-based methods.
Findings
Achieves up to 3.9 billion key-value pairs/sec throughput.
Delivers 1.4x higher find throughput than WarpCore.
Maintains stable performance across load factors 0.50-1.00.
Abstract
Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Graph Theory and Algorithms
