A Survey of State Management in Big Data Processing Systems
Quoc-Cuong To, Juan Soto, Volker Markl

TL;DR
This survey comprehensively reviews state management techniques in big data processing systems, highlighting their importance, categorizing approaches, and identifying open research challenges to guide future work.
Contribution
It introduces a taxonomy of state management methods, compares existing systems, and discusses new research directions in the field.
Findings
Diverse state management techniques are used across systems.
A taxonomy helps categorize different approaches.
Open problems remain in scalability and fault tolerance.
Abstract
State management and its use in diverse applications varies widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to some open problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Software System Performance and Reliability
