Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti; Damilare Peter Oyinloye; Jingyue Li

arXiv:2604.27789·cs.SE·May 1, 2026

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li

PDF

TL;DR

This paper introduces a governance framework for managing updates in LLMs used in software, focusing on deployer-side controls to ensure compatibility amidst provider-driven model evolution.

Contribution

It proposes a novel deployment-side governance approach with rules, targeted testing, and checkpoints to mitigate behavioral drift in evolving LLMs.

Findings

01

Targeted testing uncovers regressions missed by overall metrics.

02

Open challenges include building effective test suites and setting performance thresholds.

03

Framework frames LLM update management as a software supply chain governance problem.

Abstract

Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version changes. These silent updates can introduce behavioral drift, causing regressions in functionality, formatting, safety constraints, or other application-specific requirements. Existing approaches focus primarily on regression testing or versioning but do not provide deployer-side mechanisms for governing compatibility during opaque model evolution. This paper proposes a deployment-side governance framework based on three components: clearly defined rules for how the model is allowed to behave (production contracts), focused testing organized by deployment risk categories (risk-category-based testing suite), and release checkpoints that block updates unless they meet defined safety and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.