Automated Trustworthiness Testing for Machine Learning Classifiers

Steven Cho; Seaton Cousins-Baxter; Stefano Ruberto; Valerio; Terragni

arXiv:2406.05251·cs.LG·June 11, 2024

Automated Trustworthiness Testing for Machine Learning Classifiers

Steven Cho, Seaton Cousins-Baxter, Stefano Ruberto, Valerio, Terragni

PDF

Open Access

TL;DR

This paper introduces TOWER, an automatic method to evaluate the trustworthiness of text classifiers by analyzing explanations with word embeddings, aiming to improve trust assessment without human judgment.

Contribution

TOWER is the first technique to automatically generate trustworthiness oracles for text classifiers using explanation analysis and word embeddings.

Findings

01

TOWER detects trustworthiness decline with increasing noise.

02

TOWER's effectiveness varies when compared to human-labeled data.

03

Initial results support the hypothesis that explanation relevance correlates with trustworthiness.

Abstract

Machine Learning (ML) has become an integral part of our society, commonly used in critical domains such as finance, healthcare, and transportation. Therefore, it is crucial to evaluate not only whether ML models make correct predictions but also whether they do so for the correct reasons, ensuring our trust that will perform well on unseen data. This concept is known as trustworthiness in ML. Recently, explainable techniques (e.g., LIME, SHAP) have been developed to interpret the decision-making processes of ML models, providing explanations for their predictions (e.g., words in the input that influenced the prediction the most). Assessing the plausibility of these explanations can enhance our confidence in the models' trustworthiness. However, current approaches typically rely on human judgment to determine the plausibility of these explanations. This paper proposes TOWER, the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsLocal Interpretable Model-Agnostic Explanations