SEPP: Similarity Estimation of Predicted Probabilities for Defending and   Detecting Adversarial Text

Hoang-Quoc Nguyen-Son; Seira Hidano; Kazuhide Fukushima; Shinsaku; Kiyomoto

arXiv:2110.05748·cs.CL·October 14, 2021

SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text

Hoang-Quoc Nguyen-Son, Seira Hidano, Kazuhide Fukushima, Shinsaku, Kiyomoto

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEPP, an ensemble method that estimates similarity of predicted probabilities to detect and defend against adversarial texts by exploiting probability gap patterns.

Contribution

SEPP is a novel ensemble approach that leverages probability similarity estimation to improve adversarial text detection and correction.

Findings

01

SEPP effectively detects adversarial texts across multiple classifiers.

02

SEPP improves classification accuracy on adversarial examples.

03

SEPP demonstrates robustness against various attack types.

Abstract

There are two cases describing how a classifier processes input text, namely, misclassification and correct classification. In terms of misclassified texts, a classifier handles the texts with both incorrect predictions and adversarial texts, which are generated to fool the classifier, which is called a victim. Both types are misunderstood by the victim, but they can still be recognized by other classifiers. This induces large gaps in predicted probabilities between the victim and the other classifiers. In contrast, text correctly classified by the victim is often successfully predicted by the others and induces small gaps. In this paper, we propose an ensemble model based on similarity estimation of predicted probabilities (SEPP) to exploit the large gaps in the misclassified predictions in contrast to small gaps in the correct classification. SEPP then corrects the incorrect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

quocnsh/sepp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts