Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics
Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu

TL;DR
Loghub is a comprehensive collection of 19 real-world system log datasets designed to advance AI-driven log analytics research and facilitate benchmarking in the field.
Contribution
This paper introduces loghub, a large, publicly available dataset collection for system logs, addressing the lack of open datasets and benchmarks in AI log analysis.
Findings
Loghub datasets have been downloaded approximately 90,000 times.
Benchmarking results demonstrate the utility of loghub for AI log analysis.
The datasets cover diverse system types, supporting broad research applications.
Abstract
Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Cloud Computing and Resource Management
