DATE: Detecting Anomalies in Text via Self-Supervision of Transformers

Andrei Manolache; Florin Brad; Elena Burceanu

arXiv:2104.05591·cs.CL·April 13, 2021

DATE: Detecting Anomalies in Text via Self-Supervision of Transformers

Andrei Manolache, Florin Brad, Elena Burceanu

PDF

1 Repo

TL;DR

This paper introduces DATE, a novel self-supervised transformer-based model for text anomaly detection, achieving state-of-the-art results in semi-supervised and unsupervised settings by leveraging token and sequence-level signals.

Contribution

The paper presents a new self-supervised pretext task for text anomaly detection using transformers, combining token and sequence-level signals for improved performance.

Findings

01

Outperforms state-of-the-art in semi-supervised AUROC by +13.5% and +6.9%.

02

Surpasses all methods in unsupervised setting with 10% contaminated data.

03

Effective in detecting anomalies on 20Newsgroups and AG News datasets.

Abstract

Leveraging deep learning models for Anomaly Detection (AD) has seen widespread use in recent years due to superior performances over traditional methods. Recent deep methods for anomalies in images learn better features of normality in an end-to-end self-supervised setting. These methods train a model to discriminate between different transformations applied to visual data and then use the output to compute an anomaly score. We use this approach for AD in text, by introducing a novel pretext task on text sequences. We learn our DATE model end-to-end, enforcing two independent and complementary self-supervision signals, one at the token-level and one at the sequence-level. Under this new task formulation, we show strong quantitative and qualitative results on the 20Newsgroups and AG News datasets. In the semi-supervised setting, we outperform state-of-the-art results by +13.5% and +6.9%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bit-ml/date
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.