With False Friends Like These, Who Can Notice Mistakes?
Lue Tao, Lei Feng, Jinfeng Yi, Songcan Chen

TL;DR
This paper introduces hypocritical examples, a new security threat in machine learning where false friends conceal model mistakes, potentially misleading evaluations and causing unexpected failures in real-world deployments.
Contribution
It uncovers the threat of hypocritical examples, proposes a metric to measure hypocritical risk, and evaluates countermeasures to mitigate this security vulnerability.
Findings
Many substandard models are vulnerable to hypocritical examples across datasets.
Countermeasures can reduce hypocritical risk but do not eliminate it entirely.
Hypocritical risk persists even after adaptive robust training.
Abstract
Adversarial examples crafted by an explicit adversary have attracted significant attention in machine learning. However, the security risk posed by a potential false friend has been largely overlooked. In this paper, we unveil the threat of hypocritical examples -- inputs that are originally misclassified yet perturbed by a false friend to force correct predictions. While such perturbed examples seem harmless, we point out for the first time that they could be maliciously used to conceal the mistakes of a substandard (i.e., not as good as required) model during an evaluation. Once a deployer trusts the hypocritical performance and applies the "well-performed" model in real-world applications, unexpected failures may happen even in benign environments. More seriously, this security risk seems to be pervasive: we find that many types of substandard models are vulnerable to hypocritical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
