Continuous Integration of Data Histories into Consistent Namespaces

Mark Burgess; Andras Gerlits

arXiv:2204.00470·cs.DC·April 4, 2022

Continuous Integration of Data Histories into Consistent Namespaces

Mark Burgess, Andras Gerlits

PDF

TL;DR

This paper introduces a policy-based, scalable approach for integrating data histories into consistent, versioned namespaces, enabling reliable and controlled data flow management in shared data services.

Contribution

It presents a novel hierarchical data pipeline system that ensures consistent semantics and scalable global ordering through a versioned namespace with rate-limited versioning.

Findings

01

Establishes a global ordering invariant using a spanning tree over data shards.

02

Demonstrates controlled scalability and reliable 'latest version' semantics.

03

Provides a self-protecting, rate-limited versioning mechanism.

Abstract

We describe a policy-based approach to the scaling of shared data services, using a hierarchy of calibrated data pipelines to automate the continuous integration of data flows. While there is no unique solution to the problem of time order, we show how to use a fair interleaving to reproduce reliable `latest version' semantics in a controlled way, by trading locality for temporal resolution. We thus establish an invariant global ordering from a spanning tree over all shards, with controlled scalability. This forms a versioned coordinate system (or versioned namespace) with consistent semantics and self-protecting rate-limited versioning, analogous to publish-subscribe addressing schemes for Content Delivery Network (CDN) or Name Data Networking (NDN) schemes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.