Test Before You Deploy: Governing Updates in the LLM Supply Chain
Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li

TL;DR
This paper introduces a governance framework for managing updates in LLMs used in software, focusing on deployer-side controls to ensure compatibility amidst provider-driven model evolution.
Contribution
It proposes a novel deployment-side governance approach with rules, targeted testing, and checkpoints to mitigate behavioral drift in evolving LLMs.
Findings
Targeted testing uncovers regressions missed by overall metrics.
Open challenges include building effective test suites and setting performance thresholds.
Framework frames LLM update management as a software supply chain governance problem.
Abstract
Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version changes. These silent updates can introduce behavioral drift, causing regressions in functionality, formatting, safety constraints, or other application-specific requirements. Existing approaches focus primarily on regression testing or versioning but do not provide deployer-side mechanisms for governing compatibility during opaque model evolution. This paper proposes a deployment-side governance framework based on three components: clearly defined rules for how the model is allowed to behave (production contracts), focused testing organized by deployment risk categories (risk-category-based testing suite), and release checkpoints that block updates unless they meet defined safety and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
