GPU-Augmented OLAP Execution Engine: GPU Offloading
Ilsun Chang

TL;DR
This paper introduces a hybrid GPU-accelerated OLAP execution engine that selectively offloads high-impact primitives using risk-aware gating, reducing data movement and improving tail latency in large-scale analytical queries.
Contribution
It proposes a novel risk-aware gating mechanism for selective GPU offloading in OLAP execution, extending previous optimizer-stage gating to execution primitives.
Findings
Improved tail latency (P95/P99) with gated offloading
Reduced data transfer via key-only transfer and late materialization
Effective offloading of high-impact primitives improves performance
Abstract
Modern OLAP systems have mitigated I/O bottlenecks via storage-compute separation and columnar layouts, but CPU costs in the execution layer (especially Top-K selection and join probe) are emerging as new bottlenecks at scale. This paper proposes a hybrid architecture that augments existing vectorized execution by selectively offloading only high-impact primitives to the GPU. To reduce data movement, we use key-only transfer (keys and pointers) with late materialization. We further introduce a Risky Gate (risk-aware gating) that triggers offloading only in gain/risk intervals based on input size, transfer, kernel and post-processing costs, and candidate-set complexity (K, M). Using PostgreSQL microbenchmarks and GPU proxy measurements, we observe improved tail latency (P95/P99) under gated offloading compared to always-on GPU offloading. This work extends the risk-aware gating principle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Parallel Computing and Optimization Techniques
