Rethinking Analytical Processing in the GPU Era
Bobbi Yogatama, Yifei Yang, Kevin Kristensen, Devesh Sarda, Abigale Kim, Adrian Cockcroft, Yu Teng, Joshua Patterson, Gregory Kimball, Wes McKinney, Weiwei Gong, Xiangyao Yu

TL;DR
This paper introduces Sirius, a GPU-native SQL engine that leverages recent hardware and software advances to significantly accelerate data analytics, achieving up to 12.5x speedup and improved cost efficiency.
Contribution
The paper presents Sirius, a novel GPU-native SQL engine that provides drop-in acceleration for existing data systems using standard query representations.
Findings
Sirius achieves up to 12.5x speedup with Apache Doris.
Sirius improves cost efficiency by over 8x on TPC-H.
Sirius offers seamless integration with existing databases.
Abstract
The era of GPU-powered data analytics has arrived. In this paper, we argue that recent advances in hardware (e.g., larger GPU memory, faster interconnect and IO, and declining cost) and software (e.g., composable data systems and mature libraries) have removed the key barriers that have limited the wider adoption of GPU data analytics. We present Sirius, a prototype open-source GPU-native SQL engine that offers drop-in acceleration for diverse data systems. Sirius treats GPU as the primary engine and leverages libraries like libcudf for high-performance relational operators. It provides drop-in acceleration for existing databases by leveraging the standard Substrait query representation, replacing the CPU engine without changing the user-facing interface. Sirius achieves 8.3x and 7.4x better cost efficiency on TPC-H and ClickBench, respectively, when integrated with single-node DuckDB,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
