CrashEventLLM: Predicting System Crashes with Large Language Models

Priyanka Mudgal; Bijan Arbab; Swaathi Sampath Kumar

arXiv:2407.15716·cs.DC·October 24, 2024·2 cites

CrashEventLLM: Predicting System Crashes with Large Language Models

Priyanka Mudgal, Bijan Arbab, Swaathi Sampath Kumar

PDF

Open Access

TL;DR

This paper explores using large language models to predict system crashes from logs, aiming to improve reliability and provide insights into failure causes, leveraging historical crash data and expert annotations.

Contribution

It introduces CrashEventLLM, a novel large language model framework for crash prediction and cause analysis based on system log data.

Findings

01

CrashEventLLM effectively predicts future crash events.

02

The model offers insights into potential causes of crashes.

03

Preliminary results show promising accuracy in failure prediction.

Abstract

As the dependence on computer systems expands across various domains, focusing on personal, industrial, and large-scale applications, there arises a compelling need to enhance their reliability to sustain business operations seamlessly and ensure optimal user satisfaction. System logs generated by these devices serve as valuable repositories of historical trends and past failures. The use of machine learning techniques for failure prediction has become commonplace, enabling the extraction of insights from past data to anticipate future behavior patterns. Recently, large language models have demonstrated remarkable capabilities in tasks including summarization, reasoning, and event prediction. Therefore, in this paper, we endeavor to investigate the potential of large language models in predicting system failures, leveraging insights learned from past failure behavior to inform reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Data Quality and Management · Digital and Cyber Forensics