ProvLet: A Provenance Management Service for Long Tail Microscopy Data
Hessam Moeini, Todd Nicholson, Klara Nahrstedt, Gianni Pezzarossi

TL;DR
ProvLet is a new provenance management service designed for long-tail microscopy data, addressing challenges of frequent data collection, diversity, and scalability by organizing data at higher abstractions and providing visualization tools.
Contribution
The paper introduces ProvLet, a novel provenance service that efficiently manages diverse, high-frequency provenance data in LTM systems with low overhead and scalable architecture.
Findings
ProvLet achieves low system overhead in long-term microscopy data.
It effectively organizes provenance data at higher data abstractions.
ProvLet enables scalable provenance management over six years of microscopy data.
Abstract
Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scientific application domains a provenance solution must consider network-related events as well. Therefore, provenance data in LTM data management systems are highly diverse and must be organized and processed carefully. In this paper, we introduce a novel provenance service, called ProvLet, to collect, distribute, analyze, and visualize provenance data in LTM data management systems. This means (1) we address how to filter and store the desired transactions on disk; (2) we consider a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Distributed and Parallel Computing Systems
