Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski,, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak,, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski

TL;DR
Pathway is a unified, high-performance data processing framework that efficiently handles both batch and streaming data, supporting advanced analytics and machine learning tasks with a Python-friendly API and superior benchmarking results.
Contribution
It introduces a novel unified framework with a distributed incremental dataflow in Rust, capable of handling complex streaming analytics and outperforming existing industry solutions.
Findings
Surpasses state-of-the-art frameworks in batch and streaming benchmarks.
Supports complex streaming algorithms like PageRank that are difficult for existing systems.
Provides a Python/SQL API for flexible data analysis.
Abstract
We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · IoT and Edge/Fog Computing
