CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations
Benzhao Tang, Shiyu Yang

TL;DR
CLAD is a novel deep learning framework that detects log anomalies directly on compressed byte streams, significantly reducing pre-processing overhead and achieving state-of-the-art accuracy.
Contribution
It introduces a new architecture and training strategy for effective log anomaly detection directly on compressed data, bypassing decompression and parsing.
Findings
Achieves an average F1-score of 0.9909 across five datasets.
Outperforms baseline methods by 2.72 percentage points.
Eliminates decompression and parsing overheads in log anomaly detection.
Abstract
The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
