On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations
Xiaoxue Ma, Huiqi Zou, Pinjia He, Jacky Keung, Yishu Li, Xiao Yu, and, Federica Sarro

TL;DR
This paper investigates how different data resampling techniques affect deep learning-based log anomaly detection, revealing that oversampling and raw data resampling improve detection performance in imbalanced datasets.
Contribution
It provides a comprehensive analysis of data resampling methods in DLLAD, offering practical insights and recommendations for handling class imbalance.
Findings
Oversampling methods outperform undersampling and hybrid approaches.
Resampling in raw data space yields better results than in feature space.
Generating more minority class data improves DLLAD performance.
Abstract
Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
