TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection

Yang Cao; Sikun Yang; Chen Li; Haolong Xiang; Lianyong Qi; Bo Liu; Rongsheng Li; Ming Liu

arXiv:2501.11960·cs.CL·May 26, 2025

TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection

Yang Cao, Sikun Yang, Chen Li, Haolong Xiang, Lianyong Qi, Bo Liu, Rongsheng Li, Ming Liu

PDF

Open Access

TL;DR

TAD-Bench is a comprehensive benchmark that evaluates embedding-based text anomaly detection methods across diverse datasets and algorithms, providing insights into their effectiveness and generalizability.

Contribution

This paper introduces TAD-Bench, a new benchmark integrating multiple datasets and state-of-the-art embeddings to systematically evaluate text anomaly detection approaches.

Findings

01

Analyzes the performance of various embeddings and detection algorithms

02

Identifies strengths and weaknesses of current methods

03

Provides guidance for building robust anomaly detection systems

Abstract

Text anomaly detection is crucial for identifying spam, misinformation, and offensive language in natural language processing tasks. Despite the growing adoption of embedding-based methods, their effectiveness and generalizability across diverse application scenarios remain under-explored. To address this, we present TAD-Bench, a comprehensive benchmark designed to systematically evaluate embedding-based approaches for text anomaly detection. TAD-Bench integrates multiple datasets spanning different domains, combining state-of-the-art embeddings from large language models with a variety of anomaly detection algorithms. Through extensive experiments, we analyze the interplay between embeddings and detection methods, uncovering their strengths, weaknesses, and applicability to different tasks. These findings offer new perspectives on building more robust, efficient, and generalizable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Spam and Phishing Detection