Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation
Daniel Xue, Ryan Marcus

TL;DR
This paper demonstrates that a purpose-built, fully concurrent hash table can effectively perform parallel GROUP BY aggregation, challenging the dominance of partitioned approaches in modern database systems.
Contribution
It introduces a customized concurrent hash table for group aggregation and shows it can outperform traditional partitioning methods under certain conditions.
Findings
Purpose-built concurrent hash tables match or surpass partitioning techniques.
Resizing costs and memory pressure are manageable with the proposed approach.
Guidelines for database implementers are derived from experimental analysis.
Abstract
Efficiently computing group aggregations (i.e., GROUP BY) on modern architectures is critical for analytic database systems. Hash-based approaches in today's engines predominantly use a partitioned approach, in which incoming data is partitioned by key values so that every row for a particular key is sent to the same thread. In this paper, we revisit a simpler strategy: a fully concurrent aggregation technique using a shared hash table. While approaches using general-purpose concurrent hash tables have generally been found to perform worse than partitioning-based approaches, we argue that the key ingredient is customizing the concurrent hash table for the specific task of group aggregation. Through experiments on synthetic workloads (varying key cardinality, skew, and thread count), we demonstrate that in morsel-driven systems, a purpose-built concurrent hash table can match or surpass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Distributed systems and fault tolerance
