When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

Emma Casey; David Roberts; David Sim; Ian Beaver

arXiv:2604.27082·cs.AI·May 1, 2026

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

Emma Casey, David Roberts, David Sim, Ian Beaver

PDF

TL;DR

This paper introduces a Bayesian framework for confidently migrating production LLM systems by calibrating automated metrics with human judgments, demonstrated on a large-scale commercial QA system.

Contribution

It presents a novel Bayesian approach for model comparison that effectively combines automated metrics and limited human evaluations for enterprise LLM migration.

Findings

01

Successfully identified suitable replacement models for a large-scale QA system.

02

Calibrated automated evaluation metrics against human judgments using Bayesian methods.

03

Framework applicable broadly to enterprise LLM deployment and migration.

Abstract

We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation metrics against human judgments, enabling confident model comparison even with limited manual evaluation data. We demonstrate this framework on a commercial question-answering system serving 5.3M monthly interactions across six global regions; evaluating correctness, refusal behavior, and stylistic adherence to successfully identify suitable replacement models. The framework is broadly applicable to any enterprise deploying LLM-based products, providing a principled, reproducible methodology for model migration that balances quality assurance with evaluation efficiency. This is a capability increasingly essential as the LLM ecosystem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.