Understanding, Detecting, and Separating Out-of-Distribution Samples and Adversarial Samples in Text Classification
Cheng-Han Chiang, Hung-yi Lee

TL;DR
This paper analyzes the differences between out-of-distribution and adversarial samples in text classification, revealing their distinct behaviors across model layers and proposing a method to effectively distinguish among them.
Contribution
The paper provides a detailed analysis of OOD and Adv samples in text classification and introduces a simple, effective method for separating these anomalies based on model representations.
Findings
OOD samples show anomalies from the first layer
Adv samples exhibit abnormalities in deeper layers
Proposed method effectively distinguishes ID, OOD, and Adv samples
Abstract
In this paper, we study the differences and commonalities between statistically out-of-distribution (OOD) samples and adversarial (Adv) samples, both of which hurting a text classification model's performance. We conduct analyses to compare the two types of anomalies (OOD and Adv samples) with the in-distribution (ID) ones from three aspects: the input features, the hidden representations in each layer of the model, and the output probability distributions of the classifier. We find that OOD samples expose their aberration starting from the first layer, while the abnormalities of Adv samples do not emerge until the deeper layers of the model. We also illustrate that the models' output probabilities for Adv samples tend to be more unconfident. Based on our observations, we propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Anomaly Detection Techniques and Applications
