CS-Eval: A Comprehensive Large Language Model Benchmark for   CyberSecurity

Zhengmin Yu; Jiutian Zeng; Siyi Chen; Wenhan Xu; Dandan Xu; Xiangyu; Liu; Zonghao Ying; Nan Wang; Yuan Zhang; and Min Yang

arXiv:2411.16239·cs.CR·January 20, 2025

CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity

Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu, Liu, Zonghao Ying, Nan Wang, Yuan Zhang, and Min Yang

PDF

Open Access 1 Repo 8 Models

TL;DR

CS-Eval is a comprehensive, bilingual benchmark designed to evaluate large language models on cybersecurity tasks, covering diverse categories and cognitive levels, revealing insights into model performance and improvements over time.

Contribution

Introduces CS-Eval, the first extensive, publicly accessible cybersecurity benchmark for LLMs, encompassing diverse categories and cognitive levels to evaluate and compare model performance.

Findings

01

GPT-4 generally outperforms other models

02

Some models excel in specific subcategories

03

LLMs show significant performance improvements over months

Abstract

Over the past year, there has been a notable rise in the use of large language models (LLMs) for academic research and industrial practices within the cybersecurity field. However, it remains a lack of comprehensive and publicly accessible benchmarks to evaluate the performance of LLMs on cybersecurity tasks. To address this gap, we introduce CS-Eval, a publicly accessible, comprehensive and bilingual LLM benchmark specifically designed for cybersecurity. CS-Eval synthesizes the research hotspots from academia and practical applications from industry, curating a diverse set of high-quality questions across 42 categories within cybersecurity, systematically organized into three cognitive levels: knowledge, ability, and application. Through an extensive evaluation of a wide range of LLMs using CS-Eval, we have uncovered valuable insights. For instance, while GPT-4 generally excels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cs-eval/cs-eval
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Network Security and Intrusion Detection · Digital and Cyber Forensics