LogEval: A Comprehensive Benchmark Suite for Large Language Models In   Log Analysis

Tianyu Cui; Shiyu Ma; Ziang Chen; Tong Xiao; Shimin Tao; Yilun Liu,; Shenglin Zhang; Duoming Lin; Changchang Liu; Yuzhe Cai; Weibin Meng; Yongqian; Sun; Dan Pei

arXiv:2407.01896·cs.CL·July 3, 2024·2 cites

LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu,, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian, Sun, Dan Pei

PDF

Open Access 1 Repo

TL;DR

LogEval is a new benchmark suite that systematically evaluates large language models on various log analysis tasks, revealing their strengths and weaknesses in this critical domain.

Contribution

This paper introduces LogEval, the first comprehensive benchmark suite for assessing LLMs in log analysis tasks across multiple dimensions and languages.

Findings

01

LLMs show varying performance across log analysis tasks.

02

Prompt engineering significantly impacts LLM effectiveness.

03

Multilingual evaluation highlights language-specific strengths and weaknesses.

Abstract

Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LinDuoming/LogEval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling