A Holistic Assessment of the Reliability of Machine Learning Systems
Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J., Kochenderfer

TL;DR
This paper introduces a comprehensive framework for assessing the reliability of machine learning systems across multiple key properties, providing a holistic view of their robustness in high-stakes applications.
Contribution
It proposes a new holistic assessment methodology and reliability score, evaluating multiple reliability metrics and analyzing over 500 models to identify techniques that improve overall system dependability.
Findings
Designing for one metric does not limit others
Certain algorithms can enhance multiple reliability aspects simultaneously
The framework offers a comprehensive reliability evaluation approach
Abstract
As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Reliability and Analysis Research
