Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach

Prashant Kumar Pathak

arXiv:2604.20145·cs.DB·April 23, 2026

Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach

Prashant Kumar Pathak

PDF

TL;DR

This paper introduces a machine learning model to predict BigQuery slot-time before execution using only pre-execution signals, improving cost estimation accuracy in cloud data warehouses.

Contribution

It presents a feature-scoped ML approach with a dual-model architecture that outperforms simple baselines on cost-significant queries.

Findings

01

Model achieves 74% explained variance on full workload.

02

On cost-significant queries, MAE reduced by 30-37% compared to baselines.

03

Long-tail queries remain challenging due to unobserved runtime factors.

Abstract

Cloud data warehouses bill compute based on slot-time consumed. In shared multi-tenant environments, query cost is highly variable and hard to estimate before execution, causing budget overruns and degraded scheduling. Static query-planner heuristics fail to capture complex SQL structure, data skew, and workload contention. We present a feature-scoped machine learning approach that predicts BigQuery slot-time before execution using only pre-execution observable signals: a structured query complexity score derived from SQL operator costs, data volume features from planner estimates and workload metadata, and textual features from query text. We deliberately exclude runtime factors (slot-pool utilization, cache state, realized skew) unknowable at submission. The model uses a HistGradientBoostingRegressor trained on log-transformed slot-time, with a TF-IDF + TruncatedSVD-512 text pipeline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.