Programmable Dataflows: Abstraction and Programming Model for Data Sharing
Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro, Fernandez

TL;DR
This paper introduces a novel programming model called contract programming model (CPM) for flexible, secure, and efficient data sharing across various applications, addressing limitations of existing solutions.
Contribution
It presents a general data sharing abstraction, the contract model, and implements CPM with optimizations for practical, privacy-aware data sharing applications.
Findings
The contract abstraction effectively models diverse data sharing problems.
CPM enables complex data sharing programs with qualitative improvements.
Optimizations significantly improve the efficiency of data sharing programs.
Abstract
Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation. In this paper, we first introduce a data sharing model to represent every data sharing problem with a sequence of dataflows. From the model, we distill an abstraction, the contract, which agents use to communicate the intent of a dataflow and evaluate its consequences, before the dataflow takes place. This helps agents move towards a common sharing goal without violating any regulatory and privacy constraints. Then, we design and implement the contract programming model (CPM), which allows agents to program data sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management
