LOG.io: Unified Rollback Recovery and Data Lineage Capture for Distributed Data Pipelines
Eric Simon, Renato B. Hoffmann, Lucas Alf, Dalvan Griebler

TL;DR
LOG.io offers a unified approach for rollback recovery and data lineage capture in distributed data pipelines, supporting non-deterministic operators and dynamic scaling, with performance comparable or superior to existing methods under certain conditions.
Contribution
It introduces LOG.io, a novel log-based system that enhances recovery and lineage capture in serverless distributed pipelines, accommodating complex operator behaviors and scaling.
Findings
LOG.io performs as well as ABS during normal processing with moderate throughput.
LOG.io outperforms ABS during recovery when stragglers are present.
Data lineage capture overhead is less than 1.5% in all experiments.
Abstract
This paper introduces LOG.io, a comprehensive solution designed for correct rollback recovery and fine-grain data lineage capture in distributed data pipelines. It is tailored for serverless scalable architectures and uses a log-based rollback recovery protocol. LOG.io supports a general programming model, accommodating non-deterministic operators, interactions with external systems, and arbitrary custom code. It is non-blocking, allowing failed operators to recover independently without interrupting other active operators, thereby leveraging data parallelization, and it facilitates dynamic scaling of operators during pipeline execution. Performance evaluations, conducted within the SAP Data Intelligence system, compare LOG.io with the Asynchronous Barrier Snapshotting (ABS) protocol, originally implemented in Flink. Our experiments show that when there are straggler operators in a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Distributed systems and fault tolerance
