AD-LLM: Benchmarking Large Language Models for Anomaly Detection

Tiankai Yang; Yi Nian; Shawn Li; Ruiyao Xu; Yuangang Li; Jiaqi Li; Zhuo Xiao; Xiyang Hu; Ryan Rossi; Kaize Ding; Xia Hu; Yue Zhao

arXiv:2412.11142·cs.CL·October 13, 2025·3 cites

AD-LLM: Benchmarking Large Language Models for Anomaly Detection

Tiankai Yang, Yi Nian, Shawn Li, Ruiyao Xu, Yuangang Li, Jiaqi Li, Zhuo Xiao, Xiyang Hu, Ryan Rossi, Kaize Ding, Xia Hu, Yue Zhao

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces AD-LLM, a benchmark for evaluating large language models in NLP anomaly detection, exploring zero-shot detection, data augmentation, and model selection, and providing insights into their effectiveness and challenges.

Contribution

It is the first benchmark to systematically evaluate LLMs for NLP anomaly detection across multiple tasks and datasets, highlighting their potential and limitations.

Findings

01

LLMs perform well in zero-shot anomaly detection

02

Data augmentation with synthetic data improves detection accuracy

03

Explaining model selection remains a significant challenge

Abstract

Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs' pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

AD-LLM: Benchmarking Large Language Models for Anomaly Detection· underline

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Topic Modeling