LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering
Shenglin Zhang, Ziang Chen, Zijing Que, Yilun Liu, Yongqian Sun, Sicheng Wei, Dan Pei, Hailin Li

TL;DR
LogPurge introduces a rule-enhanced, two-stage filtering framework utilizing large language models to automatically purify log data, significantly improving anomaly detection training data quality and detection accuracy.
Contribution
The paper presents a novel, cost-aware purification framework that combines LLMs and system rules with divide-and-conquer strategies to effectively remove anomalies from log data.
Findings
Removes 98.74% of anomalies on average
Retains 82.39% of normal log samples
Achieves up to 149.72% F-1 score improvement
Abstract
Log anomaly detection, which is critical for identifying system failures and preempting security breaches, detects irregular patterns within large volumes of log data, and impacts domains such as service reliability, performance optimization, and database log analysis. Modern log anomaly detection methods rely on training deep learning models on clean, anomaly-free log sequences. However, obtaining such clean log data requires costly and tedious human labeling, and existing automatic cleaning methods fail to fully integrate the specific characteristics and actual semantics of logs in their purification process. In this paper, we propose a cost-aware, rule-enhanced purification framework, LogPurge, that automatically selects a sufficient subset of normal log sequences from contamination log sequences to train a anomaly detection model. Our approach involves a two-stage filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
