Skew Handling in Aggregate Streaming Queries on GPUs
Georgios Koutsoumpakis, Iakovos Koutsoumpakis, Anastasios Gounaris

TL;DR
This paper addresses load imbalance issues caused by data skew in GPU-based parallel aggregate queries, proposing a generic load-balancing framework and evaluating its effectiveness.
Contribution
It introduces the first runtime load-balancing techniques specifically designed for database operations on GPUs, with multiple instantiations and experimental evaluation.
Findings
Effective load balancing reduces skew impact
Framework improves query performance on skewed data
First GPU-specific load balancing approach in databases
Abstract
Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention from the data management community due to its ability to achieve significant speed-ups at a small cost. Efficient skew handling is a well-known problem in parallel queries, independently of the execution environment. In this work, we investigate solutions to the problem of load imbalances in parallel aggregate queries on GPUs that are caused by skewed data. We present a generic load-balancing framework along with several instantiations, which we experimentally evaluate. To the best of our knowledge, this is the first attempt to present runtime load-balancing techniques for database operations on GPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Distributed systems and fault tolerance
