Making Sense of Failure Logs in an Industrial DevOps Environment
Muhammad Abbas, Ali Hamayouni, Mahshid Helali Moghadam, Mehrdad, Saadatmand, Per Erik Strandberg

TL;DR
This paper introduces LogGrouper, an automated NLP-based clustering method for failure logs in industrial DevOps, which improves root cause analysis efficiency and reduces redundant review efforts.
Contribution
The paper presents a novel NLP and clustering approach, LogGrouper, for automated grouping of failure logs in industrial environments, enhancing root cause analysis.
Findings
LogGrouper achieves high clustering quality based on Silhouette Coefficient and Calinski-Harabasz Index.
Experts find the log groups useful for root cause analysis.
The approach effectively reduces manual effort in failure log review.
Abstract
Processing and reviewing nightly test execution failure logs for large industrial systems is a tedious activity. Furthermore, multiple failures might share one root/common cause during test execution sessions, and the review might therefore require redundant efforts. This paper presents the LogGrouper approach for automated grouping of failure logs to aid root/common cause analysis and for enabling the processing of each log group as a batch. LogGrouper uses state-of-art natural language processing and clustering approaches to achieve meaningful log grouping. The approach is evaluated in an industrial setting in both a qualitative and quantitative manner. Results show that LogGrouper produces good quality groupings in terms of our two evaluation metrics (Silhouette Coefficient and Calinski-Harabasz Index) for clustering quality. The qualitative evaluation shows that experts perceive the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Data Quality and Management
