ADDMU: Detection of Far-Boundary Adversarial Examples with Data and   Model Uncertainty Estimation

Fan Yin; Yao Li; Cho-Jui Hsieh; Kai-Wei Chang

arXiv:2210.12396·cs.CL·October 25, 2022

ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

Fan Yin, Yao Li, Cho-Jui Hsieh, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

ADDMU introduces a novel uncertainty-based detection method for adversarial examples in NLP, effectively identifying far-boundary adversarial attacks that challenge existing detection techniques.

Contribution

The paper proposes ADDMU, a new uncertainty estimation approach that improves detection of far-boundary adversarial examples, surpassing previous methods in robustness evaluation.

Findings

01

ADDMU outperforms previous methods by 3.6 and 6.0 AUC points.

02

Existing methods perform worse than random on far-boundary adversarial examples.

03

Uncertainty measures can characterize adversarial examples and enhance model robustness.

Abstract

Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-based adversarial attacks in NLP stop once model predictions change, and thus most adversarial examples generated by those attacks are located near model decision boundaries. To surpass this shortcut and fairly evaluate AED methods, we propose to test AED methods with \textbf{F}ar \textbf{B}oundary (\textbf{FB}) adversarial examples. Existing methods show worse than random guess performance under this scenario. To overcome this limitation, we propose a new technique, \textbf{ADDMU}, \textbf{a}dversary \textbf{d}etection with \textbf{d}ata and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uclanlp/advexdetection-addmu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling

MethodsTest