ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems
Nguyen Khoi Tran, Bushra Sabir, M. Ali Babar, Nini Cui, Mehran, Abolhasan, Justin Lipman

TL;DR
ProML is a decentralised platform that uses blockchain technology to securely manage and verify the provenance of machine learning assets across distributed teams, enhancing security and trust without relying on a central authority.
Contribution
It introduces a novel Artefact-as-a-State-Machine architecture and a user-driven provenance capturing mechanism for decentralized ML asset management.
Findings
ProML effectively manages ML provenance with acceptable performance overheads.
The system enhances security against insider threats in distributed ML workflows.
ProML's security model withstands simulated attack scenarios.
Abstract
Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cloud Data Security Solutions · Data Quality and Management
