Towards Token-Level Text Anomaly Detection
Yang Cao, Bicheng Yu, Sikun Yang, Ming Liu, Yujiu Yang

TL;DR
This paper introduces token-level text anomaly detection, enabling precise localization of anomalies within texts, and provides benchmark datasets and a unified framework that outperforms existing document-level methods.
Contribution
It presents the first formal definition and unified framework for token-level anomaly detection, along with annotated datasets and improved performance over baselines.
Findings
Better performance than 6 baseline methods
Provides annotated datasets for token-level anomalies
Enables fine-grained anomaly localization
Abstract
Despite significant progress in text anomaly detection for web applications such as spam filtering and fake news detection, existing methods are fundamentally limited to document-level analysis, unable to identify which specific parts of a text are anomalous. We introduce token-level anomaly detection, a novel paradigm that enables fine-grained localization of anomalies within text. We formally define text anomalies at both document and token-levels, and propose a unified detection framework that operates across multiple levels. To facilitate research in this direction, we collect and annotate three benchmark datasets spanning spam, reviews and grammar errors with token-level labels. Experimental results demonstrate that our framework get better performance than other 6 baselines, opening new possibilities for precise anomaly localization in text. All the codes and data are publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques
