Redbench: A Benchmark Reflecting Real Workloads
Skander Krid, Mihail Stoian, Andreas Kipf

TL;DR
Redbench is a new benchmark comprising 30 workloads that accurately reflect real-world query patterns and distribution shifts, addressing the gap between research benchmarks and industry needs.
Contribution
It introduces a comprehensive set of workloads based on real-world data, improving the realism of benchmarks for developing industry-relevant learned components.
Findings
Redbench captures realistic workload distribution shifts.
It provides a diverse set of 30 workloads from real-world data.
Redbench enhances the evaluation of database components against real industry scenarios.
Abstract
Instance-optimized components have made their way into production systems. To some extent, this adoption is due to the characteristics of customer workloads, which can be individually leveraged during the model training phase. However, there is a gap between research and industry that impedes the development of realistic learned components: the lack of suitable workloads. Existing ones, such as TPC-H and TPC-DS, and even more recent ones, such as DSB and CAB, fail to exhibit real workload patterns, particularly distribution shifts. In this paper, we introduce Redbench, a collection of 30 workloads that reflect query patterns observed in the real world. The workloads were obtained by sampling queries from support benchmarks and aligning them with workload characteristics observed in Redset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
