Oddballness: universal anomaly detection with language models

Filip Grali\'nski; Ryszard Staruch; Krzysztof Jurkiewicz

arXiv:2409.03046·cs.CL·September 6, 2024

Oddballness: universal anomaly detection with language models

Filip Grali\'nski, Ryszard Staruch, Krzysztof Jurkiewicz

PDF

Open Access

TL;DR

This paper introduces 'oddballness', a novel unsupervised metric for anomaly detection in text using language models, which measures how 'strange' a token is, outperforming likelihood-based methods in grammatical error detection.

Contribution

The paper proposes a new anomaly detection metric called oddballness that improves unsupervised grammatical error detection using language models.

Findings

01

oddballness outperforms likelihood-based methods in grammatical error detection

02

the method is fully unsupervised and applicable to various data sequences

03

demonstrates effectiveness in text anomaly detection tasks

Abstract

We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques