Understanding, Detecting, and Separating Out-of-Distribution Samples and   Adversarial Samples in Text Classification

Cheng-Han Chiang; Hung-yi Lee

arXiv:2204.04458·cs.CL·April 12, 2022·1 cites

Understanding, Detecting, and Separating Out-of-Distribution Samples and Adversarial Samples in Text Classification

Cheng-Han Chiang, Hung-yi Lee

PDF

Open Access

TL;DR

This paper analyzes the differences between out-of-distribution and adversarial samples in text classification, revealing their distinct behaviors across model layers and proposing a method to effectively distinguish among them.

Contribution

The paper provides a detailed analysis of OOD and Adv samples in text classification and introduces a simple, effective method for separating these anomalies based on model representations.

Findings

01

OOD samples show anomalies from the first layer

02

Adv samples exhibit abnormalities in deeper layers

03

Proposed method effectively distinguishes ID, OOD, and Adv samples

Abstract

In this paper, we study the differences and commonalities between statistically out-of-distribution (OOD) samples and adversarial (Adv) samples, both of which hurting a text classification model's performance. We conduct analyses to compare the two types of anomalies (OOD and Adv samples) with the in-distribution (ID) ones from three aspects: the input features, the hidden representations in each layer of the model, and the output probability distributions of the classifier. We find that OOD samples expose their aberration starting from the first layer, while the abnormalities of Adv samples do not emerge until the deeper layers of the model. We also illustrate that the models' output probabilities for Adv samples tend to be more unconfident. Based on our observations, we propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Anomaly Detection Techniques and Applications