Test & Evaluation Best Practices for Machine Learning-Enabled Systems
Jaganmohan Chandrasekaran, Tyler Cody, Nicola McCarthy, Erin Lanus,, Laura Freeman

TL;DR
This paper reviews best practices for testing and evaluating machine learning-enabled systems throughout their lifecycle, emphasizing the need for systematic approaches beyond component testing to ensure reliability.
Contribution
It categorizes the ML system lifecycle and highlights the gaps in current T&E practices, proposing the need for new systematic testing methods and metrics.
Findings
Limited T&E practices beyond component level
Need for systematic T&E strategies across all lifecycle stages
Existing ad-hoc practices can undermine system reliability
Abstract
Machine learning (ML) - based software systems are rapidly gaining adoption across various domains, making it increasingly essential to ensure they perform as intended. This report presents best practices for the Test and Evaluation (T&E) of ML-enabled software systems across its lifecycle. We categorize the lifecycle of ML-enabled software systems into three stages: component, integration and deployment, and post-deployment. At the component level, the primary objective is to test and evaluate the ML model as a standalone component. Next, in the integration and deployment stage, the goal is to evaluate an integrated ML-enabled system consisting of both ML and non-ML components. Finally, once the ML-enabled software system is deployed and operationalized, the T&E objective is to ensure the system performs as intended. Maintenance activities for ML-enabled software systems span the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Software System Performance and Reliability · Software Reliability and Analysis Research
