Toward Temporal Attribution Analytics in Dataflows
Chrysanthi Kosyfaki, Ruiyuan Zhang, Nikos Mamoulis, Xiaofang Zhou

TL;DR
This paper introduces temporal attribution, a lightweight provenance method for stream data processing systems, enabling scalable, time-focused dependency analysis without fine-grained metadata, inspired by Temporal Interaction Networks.
Contribution
It proposes a novel temporal attribution approach for dataflows, classifies data types, defines query types, and suggests indexing methods to improve scalability and practicality.
Findings
Demonstrates applicability of TINs in modeling data exchanges over time
Classifies data into discrete and liquid types for provenance analysis
Defines five temporal provenance query types
Abstract
Data provenance (the process of determining the origin and derivation of data outputs) has applications across multiple domains including explaining database query results and auditing scientific workflows. Despite decades of research, provenance tracing remains challenging due to its high computational cost and storage requirements. In streaming systems such as Apache Flink, fine-grained provenance graphs can grow super-linearly with data volume, posing significant scalability challenges. We define temporal attribution, a new lightweight form of provenance, appropriate for certain tasks, such as monitoring dependencies between system components over time quantitatively. Temporal attribution enables time-focused analysis that does not require fine-grained, tuple-level dependency meta-data. Inspired by volume-based provenance tracking in Temporal Interaction Networks (TINs), we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
