Zero-Shot Cost Models for Distributed Stream Processing
Roman Heinrich, Manisha Luthra, Harald Kornmayer, Carsten Binnig

TL;DR
This paper introduces a learned cost estimation model for Distributed Stream Processing Systems that accurately predicts performance metrics like latency and throughput, even under changing workloads and deployment conditions.
Contribution
The work presents a novel learned model that generalizes to dynamic streaming workloads, enabling real-time optimization of query execution without retraining.
Findings
Accurately predicts latency and throughput for unseen workloads.
Generalizes across real-world benchmarks and changing conditions.
Enables optimization tasks like operator placement in DSPS.
Abstract
This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize to the dynamics of streaming workloads out-of-the-box. This means a model once trained can accurately predict performance metrics such as latency and throughput even if the characteristics of the data and workload or the deployment of operators to hardware changes at runtime. That way, the model can be used to solve tasks such as optimizing the placement of operators to minimize the end-to-end latency of a streaming query or maximize its throughput even under varying conditions. Our evaluation on a well-known DSPS, Apache Storm, shows that the model can predict accurately for unseen workloads and queries while generalizing across real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
