Measuring AI Systems Beyond Accuracy
Violet Turri, Rachel Dzombak, Eric Heim, Nathan VanHoudnos, Jay Palat,, Anusha Sinha

TL;DR
This paper advocates for a comprehensive, integrated approach to testing AI systems, emphasizing the need for cross-domain evaluation methods to improve reliability beyond traditional accuracy metrics.
Contribution
It introduces six key questions to guide a holistic testing and evaluation strategy for AI systems, promoting a more complete assessment framework.
Findings
Highlights limitations of current T&E methods
Proposes a set of guiding questions for holistic evaluation
Encourages cross-domain and lifecycle-aware testing approaches
Abstract
Current test and evaluation (T&E) methods for assessing machine learning (ML) system performance often rely on incomplete metrics. Testing is additionally often siloed from the other phases of the ML system lifecycle. Research investigating cross-domain approaches to ML T&E is needed to drive the state of the art forward and to build an Artificial Intelligence (AI) engineering discipline. This paper advocates for a robust, integrated approach to testing by outlining six key questions for guiding a holistic T&E strategy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Fault Detection and Control Systems · Machine Learning and Data Classification
