Accelerating System Log Processing by Semi-supervised Learning: A Technical Report
Guofu Li, Pengjia Zhu, and Zhiyi Chen

TL;DR
This paper presents a semi-supervised learning approach for large-scale system log analysis that improves processing speed and classification accuracy by minimizing prior knowledge and leveraging a two-stage machine learning method.
Contribution
The paper introduces a novel two-stage semi-supervised method for log classification that operates with minimal prior knowledge, enhancing scalability and accuracy.
Findings
Method improves processing speed
Method increases classification accuracy
Effective on large-scale log data
Abstract
There is an increasing need for more automated system-log analysis tools for large scale online system in a timely manner. However, conventional way to monitor and classify the log output based on keyword list does not scale well for complex system in which codes contributed by a large group of developers, with diverse ways of encoding the error messages, often with misleading pre-set labels. In this paper, we propose that the design of a large scale online log analysis should follow the "Least Prior Knowledge Principle", in which unsupervised or semi-supervised solution with the minimal prior knowledge of the log should be encoded directly. Thereby, we report our experience in designing a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Anomaly Detection Techniques and Applications
