High-level Stream Processing: A Complementary Analysis of Fault Recovery
Adriano Vogel, S\"oren Henning, Esteban Perez-Wohlfeil, Otmar, Ertl, Rick Rabiser

TL;DR
This paper analyzes fault recovery in high-level stream processing frameworks, highlighting the potential for improvements and the need for better configuration management and abstractions for large-scale, real-time analytics systems.
Contribution
It extends fault recovery measurements with new experiments and explores configuration space, emphasizing the need for transparent tuning abstractions in industry-scale deployments.
Findings
Significant potential for fault recovery improvements
Configuration complexity challenges in tuning
Need for new abstractions for scalable deployment
Abstract
Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software architectural style. Several software systems rely on stream processing to deliver scalable performance, whereas open-source frameworks provide coding abstraction and high-level parallel computing. Although stream processing's performance is being extensively studied, the measurement of fault tolerance--a key abstraction offered by stream processing frameworks--has still not been adequately measured with comprehensive testbeds. In this work, we extend the previous fault recovery measurements with an exploratory analysis of the configuration space, additional experimental measurements, and analysis of improvement opportunities. We focus on robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems · Software System Performance and Reliability
