Automated Trustworthiness Testing for Machine Learning Classifiers
Steven Cho, Seaton Cousins-Baxter, Stefano Ruberto, Valerio, Terragni

TL;DR
This paper introduces TOWER, an automatic method to evaluate the trustworthiness of text classifiers by analyzing explanations with word embeddings, aiming to improve trust assessment without human judgment.
Contribution
TOWER is the first technique to automatically generate trustworthiness oracles for text classifiers using explanation analysis and word embeddings.
Findings
TOWER detects trustworthiness decline with increasing noise.
TOWER's effectiveness varies when compared to human-labeled data.
Initial results support the hypothesis that explanation relevance correlates with trustworthiness.
Abstract
Machine Learning (ML) has become an integral part of our society, commonly used in critical domains such as finance, healthcare, and transportation. Therefore, it is crucial to evaluate not only whether ML models make correct predictions but also whether they do so for the correct reasons, ensuring our trust that will perform well on unseen data. This concept is known as trustworthiness in ML. Recently, explainable techniques (e.g., LIME, SHAP) have been developed to interpret the decision-making processes of ML models, providing explanations for their predictions (e.g., words in the input that influenced the prediction the most). Assessing the plausibility of these explanations can enhance our confidence in the models' trustworthiness. However, current approaches typically rely on human judgment to determine the plausibility of these explanations. This paper proposes TOWER, the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsLocal Interpretable Model-Agnostic Explanations
