Redbench: Workload Synthesis From Cloud Traces
Johannes Wehrstein, Roman Heinrich, Mihail Stoian, Skander Krid, Martin Stemmer, Andreas Kipf, Carsten Binnig, Muhammad El-Hindi

TL;DR
Redbench is a new workload generator that creates realistic cloud data warehouse workloads by reproducing key characteristics from real traces, improving benchmarking accuracy and system optimization insights.
Contribution
Redbench introduces a workload synthesis method that captures real-world workload features, bridging the gap between synthetic benchmarks and actual cloud data warehouse workloads.
Findings
Redbench produces more realistic and reproducible workloads.
Redbench reveals the impact of system optimizations.
It transforms existing benchmarks into realistic query streams.
Abstract
Workload traces from cloud data warehouse providers reveal that standard benchmarks such as TPC-H and TPC-DS fail to capture key characteristics of real-world workloads, including query repetition and string-heavy queries. In this paper, we introduce Redbench, a novel benchmark featuring a workload generator that reproduces real-world workload characteristics derived from traces released by cloud providers. Redbench integrates multiple workload generation techniques to tailor workloads to specific objectives, transforming existing benchmarks into realistic query streams that preserve intrinsic workload characteristics. By focusing on inherent workload signals rather than execution-specific metrics, Redbench bridges the gap between synthetic and real workloads. Our evaluation shows that (1) Redbench produces more realistic and reproducible workloads for cloud data warehouse benchmarking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Software System Performance and Reliability
