Towards Token-Level Text Anomaly Detection

Yang Cao; Bicheng Yu; Sikun Yang; Ming Liu; Yujiu Yang

arXiv:2601.13644·cs.CL·January 21, 2026

Towards Token-Level Text Anomaly Detection

Yang Cao, Bicheng Yu, Sikun Yang, Ming Liu, Yujiu Yang

PDF

Open Access

TL;DR

This paper introduces token-level text anomaly detection, enabling precise localization of anomalies within texts, and provides benchmark datasets and a unified framework that outperforms existing document-level methods.

Contribution

It presents the first formal definition and unified framework for token-level anomaly detection, along with annotated datasets and improved performance over baselines.

Findings

01

Better performance than 6 baseline methods

02

Provides annotated datasets for token-level anomalies

03

Enables fine-grained anomaly localization

Abstract

Despite significant progress in text anomaly detection for web applications such as spam filtering and fake news detection, existing methods are fundamentally limited to document-level analysis, unable to identify which specific parts of a text are anomalous. We introduce token-level anomaly detection, a novel paradigm that enables fine-grained localization of anomalies within text. We formally define text anomalies at both document and token-levels, and propose a unified detection framework that operates across multiple levels. To facilitate research in this direction, we collect and annotate three benchmark datasets spanning spam, reviews and grammar errors with token-level labels. Experimental results demonstrate that our framework get better performance than other 6 baselines, opening new possibilities for precise anomaly localization in text. All the codes and data are publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques