Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen,; Jun Zhu; Tie-Yan Liu

arXiv:2105.14785·cs.LG·April 1, 2022·1 cites

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen,, Jun Zhu, Tie-Yan Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a coupled rejection metric using confidence and rectified confidence to effectively distinguish adversarial examples from correct inputs, enhancing robustness with minimal additional computation.

Contribution

It proposes a novel rectified rejection (RR) module that, when combined with existing models, improves adversarial example detection and robustness across multiple datasets and attack types.

Findings

01

RR module improves detection of adversarial examples

02

Compatible with various adversarial training frameworks

03

Effective under adaptive attack scenarios

Abstract

Correctly classifying adversarial examples is an essential but challenging requirement for safely deploying machine learning models. As reported in RobustBench, even the state-of-the-art adversarially trained models struggle to exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. This intriguing property sheds light on using coupling strategies to better detect and reject adversarial examples. We evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

P2333/Rectified-Rejection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)