LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering

Shenglin Zhang; Ziang Chen; Zijing Que; Yilun Liu; Yongqian Sun; Sicheng Wei; Dan Pei; Hailin Li

arXiv:2511.14062·cs.SE·November 19, 2025

LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering

Shenglin Zhang, Ziang Chen, Zijing Que, Yilun Liu, Yongqian Sun, Sicheng Wei, Dan Pei, Hailin Li

PDF

Open Access

TL;DR

LogPurge introduces a rule-enhanced, two-stage filtering framework utilizing large language models to automatically purify log data, significantly improving anomaly detection training data quality and detection accuracy.

Contribution

The paper presents a novel, cost-aware purification framework that combines LLMs and system rules with divide-and-conquer strategies to effectively remove anomalies from log data.

Findings

01

Removes 98.74% of anomalies on average

02

Retains 82.39% of normal log samples

03

Achieves up to 149.72% F-1 score improvement

Abstract

Log anomaly detection, which is critical for identifying system failures and preempting security breaches, detects irregular patterns within large volumes of log data, and impacts domains such as service reliability, performance optimization, and database log analysis. Modern log anomaly detection methods rely on training deep learning models on clean, anomaly-free log sequences. However, obtaining such clean log data requires costly and tedious human labeling, and existing automatic cleaning methods fail to fully integrate the specific characteristics and actual semantics of logs in their purification process. In this paper, we propose a cost-aware, rule-enhanced purification framework, LogPurge, that automatically selects a sufficient subset of normal log sequences from contamination log sequences to train a anomaly detection model. Our approach involves a two-stage filtering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection