Don't Lie to Me: Avoiding Malicious Explanations with STEALTH
Lauren Alvarez, Tim Menzies

TL;DR
STEALTH is a novel approach that uses minimal queries to AI models to prevent malicious lying and unfairness, ensuring robustness against adversarial attacks in data classification.
Contribution
It introduces a recursive bi-clustering method combined with limited querying to safeguard AI models from malicious explanations and unfairness issues.
Findings
Reduces the number of queries needed to detect malicious behavior
Prevents malicious algorithms from detecting or lying during the process
Enhances robustness of AI explanations against attacks
Abstract
STEALTH is a method for using some AI-generated model, without suffering from malicious attacks (i.e. lying) or associated unfairness issues. After recursively bi-clustering the data, STEALTH system asks the AI model a limited number of queries about class labels. STEALTH asks so few queries (1 per data cluster) that malicious algorithms (a) cannot detect its operation, nor (b) know when to lie.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
