LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu,, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian, Sun, Dan Pei

TL;DR
LogEval is a new benchmark suite that systematically evaluates large language models on various log analysis tasks, revealing their strengths and weaknesses in this critical domain.
Contribution
This paper introduces LogEval, the first comprehensive benchmark suite for assessing LLMs in log analysis tasks across multiple dimensions and languages.
Findings
LLMs show varying performance across log analysis tasks.
Prompt engineering significantly impacts LLM effectiveness.
Multilingual evaluation highlights language-specific strengths and weaknesses.
Abstract
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
