CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs
Junjie Huang, Minghua He, Jinyang Liu, Yintong Huo, Domenico Bianculli, Michael R. Lyu

TL;DR
CodeAD leverages large language models to automatically generate interpretable, lightweight Python rules for log-based anomaly detection, improving accuracy, efficiency, and scalability over existing methods.
Contribution
This paper introduces a novel framework that automatically synthesizes rule-based anomaly detection functions using LLMs, combining interpretability with high performance and scalability.
Findings
Achieves 3.6% higher F1 score on benchmark datasets.
Processes datasets up to 4 times faster than baselines.
Maintains low LLM costs under 4 USD per dataset.
Abstract
Log-based anomaly detection (LogAD) is critical for maintaining the reliability and availability of large-scale online service systems. While machine learning, deep learning, and large language models (LLMs)-based methods have advanced the LogAD, they often suffer from limited interpretability, high inference costs, and extensive preprocessing requirements, limiting their practicality for real-time, high-volume log analysis. In contrast, rule-based systems offer efficiency and transparency, but require significant manual effort and are difficult to scale across diverse and evolving environments. In this paper, We present CodeAD, a novel framework that automatically synthesizes lightweight Python rule functions for LogAD using LLMs. CodeAD introduces a hierarchical clustering and anchor-grounded sampling strategy to construct representative contrastive log windows, enabling LLMs to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
