An Empirical Analysis of Backward Compatibility in Machine Learning Systems
Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

TL;DR
This paper investigates how updates to machine learning models can break backward compatibility, causing errors in downstream systems, and highlights the importance of compatibility-aware methods for reliable ML deployment.
Contribution
It provides an empirical analysis of backward compatibility challenges in ML, revealing causes and proposing considerations for robustness and de-noising techniques.
Findings
Compatibility issues occur even without data shift due to stochastic optimization.
Training on noisy data can decrease backward compatibility despite accuracy gains.
Incompatible points tend to align with noise bias, indicating a need for robustness methods.
Abstract
In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance. However, current practices for updating models rely solely on isolated, aggregate performance analyses, overlooking important dependencies, expectations, and needs in real-world deployments. We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior in systems that make calls to the services. Prior work has shown the importance of "backward compatibility" for maintaining human trust. We study challenges with backward compatibility across different ML architectures and datasets, focusing on common settings including data shifts with structured noise and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
