PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage

Jigao Luo; Nils Boeschen; Muhammad El-Hindi; Carsten Binnig

arXiv:2512.02862·cs.DB·May 21, 2026

PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage

Jigao Luo, Nils Boeschen, Muhammad El-Hindi, Carsten Binnig

PDF

TL;DR

PystachIO is a novel distributed GPU query engine built on PyTorch that optimizes network and storage I/O to efficiently handle large-scale OLAP workloads, achieving significant speedups.

Contribution

It introduces PystachIO, a PyTorch-based distributed OLAP engine with optimizations for maximizing GPU, network, and storage utilization in large-scale environments.

Findings

01

Up to 3x end-to-end speedups over existing approaches.

02

Effective overlapping of computation and data movement improves utilization.

03

Supports scalable, storage-resident OLAP workloads.

Abstract

The AI hardware boom has led modern data centers to adopt HPC-style architectures centered on distributed, GPU-centric computation. Large GPU clusters interconnected by fast RDMA networks and backed by high-bandwidth NVMe storage enable scalable computation and rapid access to storage-resident data. Tensor computation runtimes (TCRs), such as PyTorch, originally designed for AI workloads, have recently been shown to accelerate analytical workloads. However, prior work has primarily considered settings where the data fits in aggregated GPU memory. In this paper, we systematically study how TCRs can support scalable, distributed query processing for large-scale, storage-resident OLAP workloads. Although TCRs provide abstractions for network and storage I/O, naive use often underutilizes GPU and I/O bandwidth due to insufficient overlap between computation and data movement. As a core…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Parallel Computing and Optimization Techniques