Oversampling Log Messages Using a Sequence Generative Adversarial Network for Anomaly Detection and Classification
Amir Farzad, T. Aaron Gulliver

TL;DR
This paper proposes a SeqGAN-based method to generate synthetic log messages for addressing class imbalance, improving anomaly detection and classification accuracy in log data.
Contribution
It introduces a novel approach combining SeqGAN, Autoencoder, and GRU networks to generate and utilize synthetic log data for imbalance mitigation.
Findings
Oversampling improves anomaly detection accuracy
Synthetic log generation enhances classification performance
Model tested on BGL and Openstack datasets
Abstract
Dealing with imbalanced data is one of the main challenges in machine/deep learning algorithms for classification. This issue is more important with log message data as it is typically very imbalanced and negative logs are rare. In this paper, a model is proposed to generate text log messages using a SeqGAN network. Then features are extracted using an Autoencoder and anomaly detection is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729 · Gated Recurrent Unit
