Provenance Tracking in Large-Scale Machine Learning Systems
Gabriele Padovani, Valentine Anantharaj, Sandro Fiore

TL;DR
This paper introduces yProv4ML, a flexible provenance collection library for large-scale machine learning, enabling better resource management, reproducibility, and efficiency analysis in AI workflows.
Contribution
The paper presents yProv4ML, a novel, extensible provenance collection tool designed specifically for large-scale machine learning systems, integrated with existing frameworks.
Findings
yProv4ML effectively collects provenance data in JSON format.
The library is compatible with W3C PROV and ProvML standards.
It enhances reproducibility and resource analysis in ML workflows.
Abstract
As the demand for large scale AI models continues to grow, the optimization of their training to balance computational efficiency, execution time, accuracy and energy consumption represents a critical multidimensional challenge. Achieving this balance requires not only innovative algorithmic techniques and hardware architectures but also comprehensive tools for monitoring, analyzing, and understanding the underlying processes involved in model training and deployment. Provenance data information about the origins, context, and transformations of data and processes has become a key component in this pursuit. By leveraging provenance, researchers and engineers can gain insights into resource usage patterns, identify inefficiencies, and ensure reproducibility and accountability in AI development workflows. For this reason, the question of how distributed resources can be optimally utilized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Software System Performance and Reliability
