Evaluation of Predictive Reliability to Foster Trust in Artificial   Intelligence. A case study in Multiple Sclerosis

Lorenzo Peracchio; Giovanna Nicora; Enea Parimbelli; Tommaso Mario; Buonocore; Roberto Bergamaschi; Eleonora Tavazzi; Arianna Dagliati; Riccardo; Bellazzi

arXiv:2402.17554·cs.LG·February 28, 2024·2 cites

Evaluation of Predictive Reliability to Foster Trust in Artificial Intelligence. A case study in Multiple Sclerosis

Lorenzo Peracchio, Giovanna Nicora, Enea Parimbelli, Tommaso Mario, Buonocore, Roberto Bergamaschi, Eleonora Tavazzi, Arianna Dagliati, Riccardo, Bellazzi

PDF

Open Access 1 Repo

TL;DR

This paper presents a method to evaluate the reliability of ML predictions in critical applications like medicine, using Autoencoders for Out-of-Distribution detection and a proxy model for performance assessment, demonstrated on Multiple Sclerosis data.

Contribution

The paper introduces relAI, a Python package that integrates reliability measures into ML pipelines, enhancing trustworthiness in clinical decision-making.

Findings

01

Effective in detecting Out-of-Distribution instances

02

Assessments correlate with model performance on similar samples

03

Supports clinical decision-making by identifying unreliable predictions

Abstract

Applying Artificial Intelligence (AI) and Machine Learning (ML) in critical contexts, such as medicine, requires the implementation of safety measures to reduce risks of harm in case of prediction errors. Spotting ML failures is of paramount importance when ML predictions are used to drive clinical decisions. ML predictive reliability measures the degree of trust of a ML prediction on a new instance, thus allowing decision-makers to accept or reject it based on its reliability. To assess reliability, we propose a method that implements two principles. First, our approach evaluates whether an instance to be classified is coming from the same distribution of the training set. To do this, we leverage Autoencoders (AEs) ability to reconstruct the training set with low error. An instance is considered Out-of-Distribution (OOD) if the AE reconstructs it with a high error. Second, it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bmi-labmedinfo/relai
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Occupational Health and Safety Research

MethodsSparse Evolutionary Training · Autoencoders