Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization
Dmytro Borysenkov, Adriano Vogel, S\"oren Henning, Esteban, Perez-Wohlfeil

TL;DR
This paper presents a novel pipeline combining clustering, LLM summarization, event detection, and Time Curves to efficiently analyze and visualize large-scale software system logs, revealing key events, trends, and outliers.
Contribution
It introduces a semimetric distance for meaningful log event similarity measurement and integrates multiple techniques into a holistic, automated log analysis framework.
Findings
Effective explanation of main log events across applications
Detection of trends and outliers in large distributed systems
Significant reduction in log analysis time
Abstract
Logs are crucial for analyzing large-scale software systems, offering insights into system health, performance, security threats, potential bugs, etc. However, their chaotic naturecharacterized by sheer volume, lack of standards, and variabilitymakes manual analysis complex. The use of clustering algorithms can assist by grouping logs into a smaller set of templates, but lose the temporal and relational context in doing so. On the contrary, Large Language Models (LLMs) can provide meaningful explanations but struggle with processing large collections efficiently. Moreover, representation techniques for both approaches are typically limited to either plain text or traditional charting, especially when dealing with large-scale systems. In this paper, we combine clustering and LLM summarization with event detection and Multidimensional Scaling through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research
