The Grand Software Supply Chain of AI Systems
Carmine Cesarano, Martin Monperrus

TL;DR
This paper analyzes the AI software supply chain across four architectural layers, highlighting structural gaps like verifiability and traceability, and measures its complexity through open-source projects.
Contribution
It introduces a layered analysis of the AI supply chain and identifies key structural gaps not addressed by traditional mechanisms.
Findings
AI supply chain includes 4,664 direct dependencies and 11,508 transitive packages.
Current AI systems lack verifiability, versioning, observability, and traceability.
The reference stack comprises roughly 392 million lines of code.
Abstract
AI systems rest on software with low integrity mechanisms, leaving AI systems exposed across every stage from data acquisition to final inference. This paper makes the AI supply chain a first-class object of analysis, decomposing it across four architectural layers: data acquisition, model training, model inference, and a cross-cutting substrate. Within these layers, we identify four structural gaps that traditional supply chain mechanisms do not address: verifiability, versioning, observability, and traceability.Current AI systems fall short on all of them: they carry undeclared behavioral couplings that no resolver enforces; they cannot be reverted back to known working assemblies; they degrade silently rather than surfacing breaking changes; and their lineage can hardly be approximated. To illustrate the scale of the software supply chain of AI, we measure a reference stack of 48…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
