Test & Evaluation Best Practices for Machine Learning-Enabled Systems

Jaganmohan Chandrasekaran; Tyler Cody; Nicola McCarthy; Erin Lanus,; Laura Freeman

arXiv:2310.06800·cs.SE·October 11, 2023

Test & Evaluation Best Practices for Machine Learning-Enabled Systems

Jaganmohan Chandrasekaran, Tyler Cody, Nicola McCarthy, Erin Lanus,, Laura Freeman

PDF

Open Access

TL;DR

This paper reviews best practices for testing and evaluating machine learning-enabled systems throughout their lifecycle, emphasizing the need for systematic approaches beyond component testing to ensure reliability.

Contribution

It categorizes the ML system lifecycle and highlights the gaps in current T&E practices, proposing the need for new systematic testing methods and metrics.

Findings

01

Limited T&E practices beyond component level

02

Need for systematic T&E strategies across all lifecycle stages

03

Existing ad-hoc practices can undermine system reliability

Abstract

Machine learning (ML) - based software systems are rapidly gaining adoption across various domains, making it increasingly essential to ensure they perform as intended. This report presents best practices for the Test and Evaluation (T&E) of ML-enabled software systems across its lifecycle. We categorize the lifecycle of ML-enabled software systems into three stages: component, integration and deployment, and post-deployment. At the component level, the primary objective is to test and evaluate the ML model as a standalone component. Next, in the integration and deployment stage, the goal is to evaluate an integrated ML-enabled system consisting of both ML and non-ML components. Finally, once the ML-enabled software system is deployed and operationalized, the T&E objective is to ensure the system performs as intended. Maintenance activities for ML-enabled software systems span the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Software System Performance and Reliability · Software Reliability and Analysis Research