Provably Safe Model Updates
Leo Elmecker-Plakolm, Pierre Fasterling, Philip Sosnin, Calvin Tsay, Matthew Wicker

TL;DR
This paper introduces a formal framework for certifying the safety of model updates in dynamic, safety-critical environments, ensuring models meet specifications despite distribution shifts.
Contribution
It formalizes the problem of safe model updates as computing the largest locally invariant domain and develops a tractable primal-dual approach for certification.
Findings
Efficient certification of model updates independent of data or algorithms
Matches or exceeds heuristic baselines in continual learning and foundation model fine-tuning
Provides formal safety guarantees for model updates
Abstract
Safety-critical environments are inherently dynamic. Distribution shifts, emerging vulnerabilities, and evolving requirements demand continuous updates to machine learning models. Yet even benign parameter updates can have unintended consequences, such as catastrophic forgetting in classical models or alignment drift in foundation models. Existing heuristic approaches (e.g., regularization, parameter isolation) can mitigate these effects but cannot certify that updated models continue to satisfy required performance specifications. We address this problem by introducing a framework for provably safe model updates. Our approach first formalizes the problem as computing the largest locally invariant domain (LID): a connected region in parameter space where all points are certified to satisfy a given specification. While exact maximal LID computation is intractable, we show that relaxing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
