Mitigating Downstream Model Risks via Model Provenance
Keyu Wang, Abdullah Norozi Iranzad, Scott Schaffter, Meg Risdal, Doina, Precup, Jonathan Lebensold

TL;DR
This paper introduces a standardized, machine-readable model provenance system to improve transparency, traceability, and management of foundation models throughout their lifecycle, addressing current gaps in model documentation and governance.
Contribution
It proposes a new model specification format and a unified model record repository to enhance model provenance tracking and facilitate adoption across industry and open-source communities.
Findings
Developed a machine-readable model specification format.
Created a semantically versioned model record repository.
Demonstrated improved transparency and traceability in model management.
Abstract
Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Testing and Debugging Techniques · Model-Driven Software Engineering Techniques
MethodsSparse Evolutionary Training
