Inspector: A Data Provenance Library for Multithreaded Programs
J\"org Thalheim, Pramod Bhatotia, Christof Fetzer

TL;DR
Inspector is a transparent, easy-to-integrate data provenance library for multithreaded programs that records execution dependencies to enhance system dependability, security, and efficiency with reasonable overheads.
Contribution
It introduces a novel parallel provenance algorithm operating at the binary level, enabling seamless integration into existing multithreaded applications without recompilation.
Findings
Reasonable provenance overheads on multicore benchmarks
Effective recording of control, data, and schedule dependencies
Demonstrated improvements in system dependability, security, and efficiency
Abstract
Data provenance strives for explaining how the computation was performed by recording a trace of the execution. The provenance trace is useful across a wide-range of workflows to improve the dependability, security, and efficiency of software systems. In this paper, we present Inspector, a POSIX-compliant data provenance library for shared-memory multithreaded programs. The Inspector library is completely transparent and easy to use: it can be used as a replacement for the pthreads library by a simple exchange of libraries linked, without even recompiling the application code. To achieve this result, we present a parallel provenance algorithm that records control, data, and schedule dependencies using a Concurrent Provenance Graph (CPG). We implemented our algorithm to operate at the compiled binary code level by leveraging a combination of OS-specific mechanisms, and recently released…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
