A Survey of State Management in Big Data Processing Systems

Quoc-Cuong To; Juan Soto; Volker Markl

arXiv:1702.01596·cs.DB·August 2, 2018·5 cites

A Survey of State Management in Big Data Processing Systems

Quoc-Cuong To, Juan Soto, Volker Markl

PDF

Open Access

TL;DR

This survey comprehensively reviews state management techniques in big data processing systems, highlighting their importance, categorizing approaches, and identifying open research challenges to guide future work.

Contribution

It introduces a taxonomy of state management methods, compares existing systems, and discusses new research directions in the field.

Findings

01

Diverse state management techniques are used across systems.

02

A taxonomy helps categorize different approaches.

03

Open problems remain in scalability and fault tolerance.

Abstract

State management and its use in diverse applications varies widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to some open problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Software System Performance and Reliability