Providing Assurance and Scrutability on Shared Data and Machine Learning Models with Verifiable Credentials
Iain Barclay, Alun Preece, Ian Taylor, Swapna K. Radha, Jarek, Nabrzyski

TL;DR
This paper presents a system that uses verifiable credentials and a supply chain record to enhance trust, transparency, and scrutiny of data and models in AI and ML applications, especially in sensitive fields.
Contribution
It introduces a novel architecture leveraging self-sovereign identity principles to create verifiable records of data qualities and model components for improved trust and accountability.
Findings
The system enables traceability of data origins and qualities.
Practitioners can scrutinize ML model components for biases or issues.
The approach enhances trustworthiness in AI development and deployment.
Abstract
Adopting shared data resources requires scientists to place trust in the originators of the data. When shared data is later used in the development of artificial intelligence (AI) systems or machine learning (ML) models, the trust lineage extends to the users of the system, typically practitioners in fields such as healthcare and finance. Practitioners rely on AI developers to have used relevant, trustworthy data, but may have limited insight and recourse. This paper introduces a software architecture and implementation of a system based on design patterns from the field of self-sovereign identity. Scientists can issue signed credentials attesting to qualities of their data resources. Data contributions to ML models are recorded in a bill of materials (BOM), which is stored with the model as a verifiable credential. The BOM provides a traceable record of the supply chain for an AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Scientific Computing and Data Management
