CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu, Liu, Zonghao Ying, Nan Wang, Yuan Zhang, and Min Yang

TL;DR
CS-Eval is a comprehensive, bilingual benchmark designed to evaluate large language models on cybersecurity tasks, covering diverse categories and cognitive levels, revealing insights into model performance and improvements over time.
Contribution
Introduces CS-Eval, the first extensive, publicly accessible cybersecurity benchmark for LLMs, encompassing diverse categories and cognitive levels to evaluate and compare model performance.
Findings
GPT-4 generally outperforms other models
Some models excel in specific subcategories
LLMs show significant performance improvements over months
Abstract
Over the past year, there has been a notable rise in the use of large language models (LLMs) for academic research and industrial practices within the cybersecurity field. However, it remains a lack of comprehensive and publicly accessible benchmarks to evaluate the performance of LLMs on cybersecurity tasks. To address this gap, we introduce CS-Eval, a publicly accessible, comprehensive and bilingual LLM benchmark specifically designed for cybersecurity. CS-Eval synthesizes the research hotspots from academia and practical applications from industry, curating a diverse set of high-quality questions across 42 categories within cybersecurity, systematically organized into three cognitive levels: knowledge, ability, and application. Through an extensive evaluation of a wide range of LLMs using CS-Eval, we have uncovered valuable insights. For instance, while GPT-4 generally excels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗clouditera/secgptmodel· 140 dl· ♡ 102140 dl♡ 102
- 🤗clouditera/SecGPT-1.5Bmodel· 72 dl· ♡ 172 dl♡ 1
- 🤗clouditera/SecGPT-7Bmodel· 68 dl· ♡ 468 dl♡ 4
- 🤗clouditera/SecGPT-14Bmodel· 1.9k dl· ♡ 31.9k dl♡ 3
- 🤗clouditera/SecGPT-7B-GGUFmodel· 73 dl73 dl
- 🤗clouditera/SecGPT-1.5B-GGUFmodel· 29 dl29 dl
- 🤗clouditera/SecGPT-14B-GGUFmodel· 77 dl77 dl
- 🤗Nitish-Garikoti/secgptmodel· 23 dl23 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Network Security and Intrusion Detection · Digital and Cyber Forensics
