SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
Pengfei Jing, Mengyun Tang, Xiaorong Shi, Xing Zheng, Sen Nie, Shi Wu,, Yong Yang, Xiapu Luo

TL;DR
SecBench is a large, multi-dimensional dataset designed to evaluate LLMs specifically in cybersecurity, covering various question types, languages, and difficulty levels to better assess domain-specific capabilities.
Contribution
This paper introduces SecBench, the largest and most comprehensive cybersecurity benchmark dataset for LLMs, with diverse question formats, languages, and expert-level content.
Findings
Benchmarking 16 SOTA LLMs demonstrates SecBench's usability.
SecBench includes over 47,000 cybersecurity questions in multiple formats and languages.
Automatic evaluation methods enable scalable assessment of LLM performance.
Abstract
Evaluating Large Language Models (LLMs) is crucial for understanding their capabilities and limitations across various applications, including natural language processing and code generation. Existing benchmarks like MMLU, C-Eval, and HumanEval assess general LLM performance but lack focus on specific expert domains such as cybersecurity. Previous attempts to create cybersecurity datasets have faced limitations, including insufficient data volume and a reliance on multiple-choice questions (MCQs). To address these gaps, we propose SecBench, a multi-dimensional benchmarking dataset designed to evaluate LLMs in the cybersecurity domain. SecBench includes questions in various formats (MCQs and short-answer questions (SAQs)), at different capability levels (Knowledge Retention and Logical Reasoning), in multiple languages (Chinese and English), and across various sub-domains. The dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Cloud Data Security Solutions · Privacy-Preserving Technologies in Data
MethodsFocus
