Terabyte-Scale Analytics in the Blink of an Eye
Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen

TL;DR
This paper demonstrates that leveraging GPU clusters with ML/HPC techniques can massively accelerate distributed SQL query processing, achieving at least 60 times performance improvements and enabling TPC-H at 1TB scale to run in seconds.
Contribution
It introduces a GPU-optimized system for large-scale SQL analytics that significantly outperforms traditional CPU-based solutions, highlighting a new paradigm shift in data analytics performance.
Findings
Achieved at least 60× performance gains over existing systems.
Successfully ran all 22 TPC-H queries at 1TB scale.
Demonstrated the potential of GPU clusters for terabyte-scale analytics.
Abstract
For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and popularity of AI models lead to the deployment of incredibly powerful GPU clusters in commercial data centers. Compared to CPU-only solutions, these clusters deliver impressive improvements in per-node compute, memory bandwidth, and inter-node interconnect performance. In this paper, we study the problem of scaling analytical SQL queries on distributed clusters of GPUs, with the stated goal of establishing an upper bound on the likely performance gains. To do so, we build a prototype designed to maximize performance by leveraging ML/HPC best practices, such as group communication primitives for cross-device data movements. This allows us to conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Graph Theory and Algorithms
