Continuous Integration of Data Histories into Consistent Namespaces
Mark Burgess, Andras Gerlits

TL;DR
This paper introduces a policy-based, scalable approach for integrating data histories into consistent, versioned namespaces, enabling reliable and controlled data flow management in shared data services.
Contribution
It presents a novel hierarchical data pipeline system that ensures consistent semantics and scalable global ordering through a versioned namespace with rate-limited versioning.
Findings
Establishes a global ordering invariant using a spanning tree over data shards.
Demonstrates controlled scalability and reliable 'latest version' semantics.
Provides a self-protecting, rate-limited versioning mechanism.
Abstract
We describe a policy-based approach to the scaling of shared data services, using a hierarchy of calibrated data pipelines to automate the continuous integration of data flows. While there is no unique solution to the problem of time order, we show how to use a fair interleaving to reproduce reliable `latest version' semantics in a controlled way, by trading locality for temporal resolution. We thus establish an invariant global ordering from a spanning tree over all shards, with controlled scalability. This forms a versioned coordinate system (or versioned namespace) with consistent semantics and self-protecting rate-limited versioning, analogous to publish-subscribe addressing schemes for Content Delivery Network (CDN) or Name Data Networking (NDN) schemes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
