Don't Lie to Me: Avoiding Malicious Explanations with STEALTH

Lauren Alvarez; Tim Menzies

arXiv:2301.10407·cs.SE·January 26, 2023

Don't Lie to Me: Avoiding Malicious Explanations with STEALTH

Lauren Alvarez, Tim Menzies

PDF

Open Access

TL;DR

STEALTH is a novel approach that uses minimal queries to AI models to prevent malicious lying and unfairness, ensuring robustness against adversarial attacks in data classification.

Contribution

It introduces a recursive bi-clustering method combined with limited querying to safeguard AI models from malicious explanations and unfairness issues.

Findings

01

Reduces the number of queries needed to detect malicious behavior

02

Prevents malicious algorithms from detecting or lying during the process

03

Enhances robustness of AI explanations against attacks

Abstract

STEALTH is a method for using some AI-generated model, without suffering from malicious attacks (i.e. lying) or associated unfairness issues. After recursively bi-clustering the data, STEALTH system asks the AI model a limited number of queries about class labels. STEALTH asks so few queries (1 per data cluster) that malicious algorithms (a) cannot detect its operation, nor (b) know when to lie.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms